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Executive Summary 



The Square Kilometre Array (SKA) will have a low frequency component (AA-low/SKA- 
low^]) which has as one of its main science goals the study of the redshifted 21cm line from the 
earliest phases of star and galaxy formation in the Universe (see SKA Memo 125). It is during 
this phase that the first building blocks of the galaxies that we see around us today, including 
our own Milky Way, were formed. It is a crucial period for understanding the history of the 
Universe and one for which we have currently very little observational data. 
We divide the period into two different phases based on the physical processes which affect 
the Intergalactic Medium. The first period, called the Cosmic Dawn, saw the formation of the 
first stars and accreting black holes, which changed the state of the still neutral Intergalactic 
Medium. The second period, known as the Epoch of Reionization, started when relatively large 
areas between the galaxies had become ionized by the radiation produced in those galaxies. 
Observations of the redshifted 21 -cm line with SKA will provide a new and unique window on 
the entire period of Cosmic Dawn and Reionization. The signal is sensitive to the emergence 
of the first stellar populations, radiation from growing massive black holes and the formation 
of larger groups of galaxies and bright quasars. At the same time it maps the distribution of 
most of the baryonic matter in the Universe. The study of the redshifted 21cm line will teach 
us fundamental new things about the earliest phases of structure formation, cosmology and 
even has the potential to lead to the discovery of new physical phenomena. Here we present 
an overview of the science questions that SKA-low can address, how we plan to tackle these 
questions and what this implies for the basic design of the telescope. 

The redshifted 21cm signal will be analyzed with different techniques, which each come with 
their own requirements for the SKA: (i) Tomography, (ii) power-spectra and higher-order statis- 
tics, (iii) hydrogen absorption, (iv) global/total-intensity signal. Whereas all precursors/path- 
finders aim to study the signal statistically through its power spectrum, SKA will be able to 
image the neutral hydrogen distribution directly and its focus will therefore be more on tomog- 
raphy. This introduces somewhat different requirements for the design of the radio interferom- 
eter than power- spectrum studies do. At the same time the SKA will have enough collecting 
area to explore lower frequencies and thus earlier epochs than any of its precursors/pathfinders. 
Through both of these improvements SKA will revolutionize the study of the Cosmic Dawn and 
Reionization. 

We argue that for an optimal study of the 21 -cm signal through the period of the Cosmic Dawn 
and the Epoch of Reionization (Memo 125), a basic reference design for SKA-low should have 
at least the following: 

1. An absolute minimal frequency range 54-190 MHz; an optimal frequency range 54- 
215 MHz and a wide frequency range of 40-240 MHz. 

2. A frequency resolution of ~1 KHz. 

'We will use both names throughout the White Paper, mostly indicating the very low frequency (<J 250 MHz) 
part of the SKA array interesting for HI studies at redshift z ;> 5. 
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3. 



A physical collecting area A coll ^ 1 km 2 x (z/ crit /100MHz) 2 for z/ crit < 100 MHz and at 
least 1 km 2 for z/ crit > 100 MHz. 



4. A critical frequency (z/ crit ; corresponding to a A/2 size of a receiver dipole) around 
100 MHz. 

5. A core area with a diameter of <; 5 km with most collecting area (~75%) inside the inner 
2 km. 

6. A set of longer baselines (~ 10-20% of the core collecting area) out to ~ 100 km for 
calibration, ionospheric modeling and for building a detailed sky model. 

7. A station size of order ~35 m which corresponds to a 2.5-10 degree field-of-view from 
200 MHz down to 50 MHz. 

The proposed basic SKA-low array design allows most Cosmic Dawn and Epoch of Reioniza- 
tion science goals described in this white paper to be reached within 1000 hrs of observing time, 
but the capabilities of this new and unprecedented radio telescope will undoubtedly also raise 
many new and exciting scientific questions. 
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1 Introduction and motivation 



This white papeijj is meant to provide a background for the development of the Square Kilo- 
meter Array from the point of view of research on reionization and the cosmic dawn. Since 
the writing of the SKA Science book in 2004 there has been major progress in the field and we 
felt there was a need for an update. At the same time the road to the construction of the SKA 
is becoming more and more clear, with the official SKA Organization having been founded in 
November 2011, the results of the site selection process having been announced in May 2012 
(with SKA-low being built in Australia) and with an SKA Director- General having been ap- 
pointed in September 2012. 

Given these developments, the European SKA Epoch-of-Reionization Science Working Group 
(SKA-EoR-SWG) felt it was timely to summarize what Cosmic-Dawn/EoR science can be done 
with the SKA, how it can be done and what this implies for the design of the telescope. The 
contents of the current version is mostly based on experience in Europe, but we envisage it to 
become a 'living document' and welcome contributions from the wider (global) community and 
to further strive for the best possible design for an SKA-low to accomplish the science as layed 
out in this White Paper and motivated by the goals of SKA memo 125. 

We intend to update this White Paper on a regular basis to reflect progress in the field and 
developments within the SKA project. 



2 This first draft of this white paper was written during a three day workshop at the Oskar Klein Centre in 
Stockholm, January 18-20, 2012. 
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2 Science 



Studies of the earliest epochs of star formation in the Universe are one of the major frontiers 
of modern astronomy and cosmology. After decoupling from radiation the matter cooled and 
its density decreased as the Universe further expanded, starting a period called the Dark Ages, 
named after the absence of any light sources. The small density fluctuations left over from 
inflation grew under the force of gravity to eventually form the first nonlinear structures of 
dark and baryonic matter. In these first halos the gas collapsed to form the first stars. Merging 
and accretion gradually led to the formation of larger and larger structures, up to the scales 
of small galaxies. The formation of those first sources of radiation ultimately changed the 
Universe from the smallest to the largest scales, and represents its last global transition, from 
a cold and neutral state to mostly warm and ionized. This process is referred to as the Epoch 
of Reionization (EoR) and was likely quite extended in time. Presently we only have indirect 



observations of this process, apart from the detection of some rare sources (see Section 2.1 ) and 
much remains unclear about its timing and duration, as well as the nature of the main sources 
of ionizing photons. In this document we will follow the most recent theoretical models and 
indirect observables as a guidance. 

The ACDM model of the Universe predicts that the very first luminous objects may have ap- 
peared around a redshift of 50, but it took until much later, z < 15, before substantial ionization 
of the Intergalactic Medium (IGM) occurred. This transitionary period after the formation of 
the first luminous sources and before substantial ionization of the IGM, we will call the Cosmic 
Daw^ During this era ultra-violet radiation from the first generations of stars was capable 
of gradually changing the quantum state of the cold neutral hydrogen, making it observable 
in 21cm absorption. The first generations of X-ray sources formed from the first generations 
of stars and subsequently heated the IGM, changing the HI signal from absorption to emission. 
Around the same time, or slightly later, the individual, small regions of ionized hydrogen around 
galaxies started to percolate, both due to the strong clustering of the first sources (Figure[TJ left) 
and the exponentially growth of structures. This led to the formation of giant regions of ionized 
hydrogen, up to several tens of comoving Mpc (cMpc0) across (Figure[TJ right) which ultimately 
overlapped to complete reionization around redshift z ~ 6. 

The SKA will observe this era using the redshifted 21cm line of neutral hydrogen. The bright- 
ness of this line as produced in the intergalactic medium can be written as ( Field[ 1959} Madau 



etalj [19971 ): 



, T = 3h p c 3 A 21cm n m f _ T CM b(z) \ f 1 dv {l \ 1 
b 327r^ 2 2 lcm (1 + z)H(z) V T s J\ + H{z) dry J { ) 

where h p is Planck's constant, c the speed of light, k B the Boltzmann constant; A 2 i cm and 
^2icm are the Einstein A-coefficient and frequency of the 21cm transition, respectively. The 
cosmological parameters entering the equation are the redshift-dependent Hubble parameter, 
H(z), the Cosmic Microwave Background (CMB) temperature, Tcmb(^), and the redshift z of 



3 Sometimes this period is called the Late Dark Ages but this is confusing as the Universe at those times did 
contain sources of radiation and therefore was no longer truly dark 

4 We will use cMpc for comoving Mpc and pMpc for proper Mpc; without any prefix Mpc means cMpc 
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the signal. The gas properties are given by the HI number density, n m , the proper gradient 
along the line of sight of the peculiar velocity, dvu/drn, and the spin (or excitation) temperature 
of the 21cm transition, T s . 

Using cosmological parameters to express the density in terms of the overdensity 5 = p/(p) — 1 
and scaling to canonical values we obtain 



ST h « 27x m (l + 5) 



1-K 



l + z 
10 



TcmbO) 



0.248 



1 



1 df|j 
H(z) dry 



fib h 
0.044 07 

mK, 



0.27 



(2) 



with xhi the neutral hydrogen fraction, fi m and f^, the total matter and baryon density in terms 
of the critical density, and Y p , the primordial helium abundance by mass. 
The 21cm radiation thus provides us with information on the ionization state of the IGM, its 
density, the line of sight (LOS) velocity gradient and the spin temperature. The latter couples 
strongly to kinetic gas temperature when a sufficiently high flux of UV photons is available 



(the Wouthuysen-Field effect, as explained in Section 3.1.3), or in regions of sufficiently high 
density. Furthermore, the observed frequency contains information about the emission redshift, 



which along with the sky position will allow three-dimensional tomography of the IGM (Madau 
et al.[ |1997[ ). This will help us to answer important questions on early galaxy formation, the 
state of the intergalactic medium, cosmology and perhaps even lead to the discovery of new. 
unexpected physical phenomena. 



2.1 First generations of galaxies 

The measurements with SKA will provide a unique window into the properties of the first 
generations of galaxies. Optical/near-infrared observations have been successful in detecting 
galaxies from redshifts as high as 8, or perhaps even 10 (Bouwens et al. , |2010 , |2011a| ). These 



observations suggest that by that time some fairly substantial galaxies had already developed 



and that star formation had been ongoing for at least 10 years before that epoch flLabbe et al. 



2010[ ). However, since these galaxies are faint, only the tip of the iceberg can be detected with 



current telescopes and the detected galaxies cannot by themselves have reionized the Universe. 
Extrapolating from the observed galaxies to fainter ones requires assumptions about the highly 
uncertain faint end slope of the luminosity function, leading to debates on whether star forming 



galaxies can have been responsible for the reionization of the Universe at all (see e.g. Lorenzoni 
etan[20TT||Bouwens et al4|2011b| ). 



The 21cm observations will approach the problem from a different angle as the removal of neu- 
tral hydrogen from the IGM will depend on the integrated extreme ultra-violet (EUV) flux of all 
sources. Under the assumption that star formation in galaxies was responsible for reionization, 
we will thus be able to measure the combined effect of all galaxies and map out the cosmic star 
formation rate during the epoch of reionization. The morphologies of the HII regions may even 
help us characterize the dominant types of galaxies responsible, as galaxies of different masses 
have different clustering properties. Morphology may also help in establishing whether sources 
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Figure 1: Left panel: Cosmic Web at redshift z = 8 from an N-body simulation with boxsize 
20 h^ 1 cMpc and 5488 3 = 165 billion particles resolving the halos hosting the first stars (M > 1O 5 M ). 
Shown are projections of the total density (green) and halos (orange). Right panel: Spatial slices of the 
ionized and neutral gas density at z = 8 from radiative transfer simulation with volume 425 h~ l cMpc to 
a side. Shown are the density field (green) overlayed with the ionized fraction (red/orange/yellow) and 
the cells containing sources (dark/blue). Courtesy of I. T. Iliev and G. Mellema. 



other than stars played an important role, such as rare bright quasi-stellar objects (QSOs). Fur- 
thermore, the distribution of ionized regions will provide a crude map of the cosmic web of 
structure at these early epochs as simulations show that even then the sources concentrated 
along filaments. 

QSOs, powered by accretion onto a central supermassive black hole (SMBH) are the most 
extreme of the class of objects producing very hard (X-ray) radiation. Due to the frequency 
dependence of the hydrogen ionization cross-section, this radiation is more efficient at heat- 
ing the IGM than at ionizing it. Different heating histories then could be traced through the 
redshifted 21cm signal as the strength of the signal depends on the spin temperature, which in 
turn depends on the gas temperature (see Section |3TT| for further discussion). Mapping out the 
temperature evolution of the IGM before full ionization from initially cold to warm will thus 



provide another diagnostic on the evolution of galaxies and their constituents (e.g. |Santos et aL 



20T0t|Baek et al.|2010||Ciardi et al.|20TT||Pritchard & Loeb|20Tl"T ). Since the energy required to 



heat the IGM above the CMB temperature is less than 1 eV per baryon, the expectation is that 
this heating happened before substantial ionization. We will thus be able to extend our history 
of structure formation well beyond redshift 10, perhaps as far as 20. 

Most likely before any substantial heating, ultra-violet radiation from the very first generations 
of stars was capable of decreasing the spin temperature of the cold neutral hydrogen from the 
CMB temperature to its kinetic temperature, thus making it observable in absorption. The 
fluctuations in the 21cm signal caused by the patchiness of this process carry information about 
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the distribution of these first generations of stars. This signal originates most likely from even 
before redshift 20. 



2.2 Evolution of the Intergalactic Medium 



The SKA measurements of the 21cm signal will in the first place provide information on the in- 
tergalactic medium. The size and distribution of ionized regions will give us information about 
how patchy and extended the reionization process was, relevant for understanding the temper- 
ature structure of the IGM for a substantial period after reionization (Theuns et al. 2002[ Hui 



& Haiman 2003). Images of the 21cm signal in neutral regions will show the level of density 
fluctuations in the IGM, essentially a masked version of the baryonic density distribution. 
The signal before substantial ionization should give us an even clearer picture of the baryonic 
density power spectrum, as well as information about the temperature distribution. The 21cm 
signal is the only way to get information about the large scale IGM, the environment which 
forms the initial condition for galaxy formation and the CD/EoR is the last epoch when two 
dimensional maps of the IGM at different redshifts can be made. 

It is important to stress that these measurements will provide us with unique information on 
the structure of the Universe. Even today, most of the matter is not locked up in galaxies, but 
is distributed between them, and only tiny fractions of it are observable. During the Cosmic 
Dawn and the EoR the collapsed fraction was less than 1% and the 21cm observations can thus 
in principle map out the three-dimensional distribution of matter in the Universe at that age. 
It will provide an important check on our current ideas about structure formation according to 
the ACDM model as the density fluctuations during these epochs was the result of the action of 
gravity on the density fluctuations observed in the CMB. 

A very relevant example of this is the recent prediction of supersonic bulk flows in the neu- 
tral hydrogen on scales of a few cMpc with large scale variations on scales of ~100 cMpc 



( |Tseliakhovich & Hirata[ |2010| ). This effect is caused by a quadratic term in the evolutionary 
equations of large scale structure which previously (incorrectly) had been neglected. Although 
a small effect, its consequences for reionization and 21cm brightness temperature fluctuations 
are expected to be important. Firstly, the effect suppresses star formation in small mass haloes 
(e.g. Maio et al. 201 1 ; Fialkov et al. , 201 1[ ), pushing reionization to somewhat lower redshifts 
because of the additional IGM velocity. Secondly, the relative velocity between dark matter 
and gas enhances large scale clustering and produces a prominent cosmic web on ~100cMpc 



scales in the 21cm brightness temperature distribution (|Visbal et TaL , 2012)). In particular the lat 



ter effect might make the detection of large scale intensity fluctuations much easier at redshifts 
as high as ~20. With a low enough frequency capability, SKA-low should be able to study this 
effect. 



2.3 Cosmology 

The two sections above deal with astrophysics, but the 21cm signal can also be used for more 
fundamental cosmological measurements. This is because inhomogeneities in the HI gas den- 
sity field, which contribute to the 21cm signal, should trace those of the underlying CDM, and 
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thus, of the fundamental cosmological parameters that define the power spectrum of the linear 
density field ( |Tozzi et al.[ |2000f |ZaldarriagaetaL| [2004] |Bowman et al} [2%07| |McQuinn etal 
20061 ). 



Note that, contrary to the CMB, we will be observing the signal across several redshifts, thus 
having access to a large volume, which is an enormous advantage for a proper cosmological 
analysis. Considering, as an example, a full sky experiment with SKA resolution of an ar- 
cminute at z ~ 20 and with depth 10 MHz, the number of independent modes available for 
measurement would be A^icm ~ 7 x 10 10 , which is 10 3 more than what is available in the CMB 



(Loeb & Zaldarriaga, 2004). A field of 36 deg 2 would have approximately as many modes as 
the full sky CMB. 

Moreover, 21cm experiments will probe an epoch in the evolution of the Universe that is inac- 
cessible to any other experiment, thus providing a handle on non-standard phenomena such as 
early dark-energy models. Unfortunately, the other "astrophysical" contributions to the 21cm 
signal will complicate the analysis and deteriorate the constraints on the cosmological param- 
eters dSantos & Coorayj |2006| |McQuinn efaLj |2006| |Mao et al.[ |2008| ). The cosmological 
analysis of the 21cm signal will be essentially based on measurements of its three-dimensional 



power spectrum (see Section 3.3 ) and its evolution across cosmic time, although other observ- 



ables could be used to get a better handle on the astrophysical contributions. 
In the high precision cosmology era that we are entering, even if 21cm experiments cannot be 
competitive with other experiments such as Planck for the case of the standard cosmological 
model, they will help to put stringent constraints on the reionization history, thus helping to 
break degeneracies with other parameters measured by Planck, such as the running of the pri- 
mordial spectral index (Pan dolfi et al.[ 2010[ ). Also, Mao et al. ( 2008j ) showed that at lower 
redshifts (z < 9), it should be possible to use tomographic measurements with the SKA to im- 
prove the sensitivity to spatial curvature and neutrino masses compared to Planck by a factor of 
6 to AQ k ps 0.004 and Am M ps 0.056 eV (using Planck priors). The constraints on the curvature 
of the Universe have the advantage that they are less sensitive to uncertainties in the dark energy 
equation of state than the CMB alone ( |Knox[ |2006| ). 



2.4 New Physics 

Since we are entering unchartered waters, there are many opportunities for discovering new 
physics phenomena with SKA-low. Here we summarize some of these. 

• DM annihilation/decay 

Physically motivated Dark Matter (DM) models predict that the DM candidate may either 



decay or annihilate into standard model particles (see Bertone et al. , 2005 for a review) 



The annihilation or the decay of even a fraction of the DM (which may be constituted 
of different species, coupled differently to the Standard Model of particle physics) would 
inject a shower of particles in the environment where the annihilation/decay takes place. 
The nature and spectrum of such a shower depends on the very nature of the DM, with a 
natural endpoint at the mass of the DM particle itself, m DM . In models popular today, such 
as those arising from SuperSymmetry or Kaluza-Klein theories, m DM ranges between 
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few GeV and few TeV: both leptons or hadrons injected at such energies will be partially 
absorbed by the environment, thus depositing energy which contributes to heat and ionize 
the IGM. 

The alteration of the ionization state of the IGM may be seen through the CMB cross- 
correlation power spectra (Padmanabhan & Finkbeiner 2005 \ Mapelli et al. , 2006), and 
the forthcoming PLANCK data may show hints of, or rule out low mass DM particles 
self-annihilating with cross sections at the level required for thermal production in the 



early Universe (Galli et al.[ 2009) 



Yet, CMB loses sensitivity at particle masses higher than m DM ~50GeV and for decaying 



DM (Galli et al. 2009), whereas the 21cm line is best suited to explore this regime due 



to its sensitivity to smaller (and later- timed) energy injections (Furlanetto et al. 2006b). 
Expected brightness fluctuations at redshift z ~ 50 are of the order of fractions of a mK, 
with an amplitude of ~2 for annihilating DM, and up to an order of magnitude lower for 
decaying DM, extending with weaker strength to lower redshifts z ~20. At these lower 
redshifts the effects of DM decay must be entangled from astrophysical effects, however 
the characteristic behaviour expected from DM annihilation would make this possible 
( Valdes et al.[ 2012[ ). In light of the different dependence of such fluctuations on DM 
parameters (annihilation versus decay, mass, injected primary spectrum), their detection 
will help shedding light on the nature of DM itself, allowing to constrain lifetimes <10 27 s 
and self annihilation cross section (cry) ~10~ 26 cm 3 /s (for m DM = lOOGeV). 

Evaporating black holes 

Many inflationary scenarios predict the production of primordial black holes via the col- 
lapse of overdense peaks in the initial density field. Primordial black holes with masses 
in the range M pbh = 10 14 — 10 17 g will evaporate via the release of Hawking radiation 



between recombination and present day (Ricotti et al. , 2008). They therefore represent 
a possible source of IGM heating relevant for 21cm studies, which could result in spin 
temperature fluctuations ( |Mack & Wesley[ 2008). 21cm studies are most sensitive to 
the mass range M pbh ~ 10 14 g, which evaporate in a burst at redshifts z ~ 30 and, at 
sufficient number densities, could heat the IGM above the CMB temperature before star 
formation began. Higher mass primordial black holes would be hard to distinguish from 
decaying DM. 

Cosmic strings 

Cosmic strings are one dimensional topological defects that can be produced in particle 
physics phase transitions. As they move they produce a wake that stirs up the IGM induc- 
ing temperature and density fluctuations. Strings were originally put forward as a source 
of cosmological density fluctuations for seeding the growth of structure, although the high 
string densities this requires are now excluded by CMB observations. At lower number 
densities, cosmic strings might still exist and could be constrained via their heating effect 



on the IGM (Brandenberger et al., 2010). String wakes would appear as extended wedge- 
shaped regions with the string at the tip seen as emission features in high resolution 21cm 
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maps. String tensions of Gfx < 6 x 10 7 might be constrained and the strings typically 
span a Hubble radius in size. At high redshifts (z > 30), future 21cm experiments should 



be able to constrain cosmic strings with tension Gji ~ 10 11 (Khatri & Wandelt 2008). 
Variations in the fine structure constant (a) 

The 21cm signal is very sensitive to the variations in a (e.g. v 2 \ oc a 4 , A 10 oc a 13 ) and it 
is so far the only probe of the fine structure constant between recombination and z ~ 8. 
This effect can in principle be probed at high redshifts since a 1% change in a changes the 



signal by > 5% and imprints a characteristic evolution with redshift ( |Khatri & Wandelt 



2007). However, since astrophysical effects are expected to affect the 21cm signal in the 
SKA frequency range, it may be hard to detect this signature with SKA. 



12 



3 Analysis of redshifted 21cm signal 



The signal we want to observe is the redshifted 21cm signal from neutral hydrogen. This section 
explains in more detail how this signal depends on the astrophysical and cosmological parame- 
ters and the various ways in which it can be analyzed in order to study the topic as outlined in 
Section H 



3.1 Description of 5T^ and its dependencies 

The measurable quantity is the differential brightness temperature 5Tb. Equations [T] and [2] in 
Section[2]describe how it depends on the local properties of the IGM and on global cosmological 
parameters. Before we examine the various contributions, let us summarize the underlying 
assumptions of these equations. 

• The IGM is assumed homogenous on kpc scales (within the 21cm line profile) 

• The 21cm line is optically thin. The optical depth of the line is given by 



3 h p c 3 A 2 icm xmnn 
32n k B v% lcm r s (l + z)(dv\\/dr\\) 



72icm \ z ) — 777; ; 2 t^Ti — i — \73 — 73 — \ v-^ 



9.6 x l(H%i(l + 5) 
Tqmb 



1 + z 



10 

H(z)/(l + z) 



3/2 



dv\\/dr\\ 



which combined with the radiative transfer solution 

5T h = (1 + z)- 1 (T s - T CMB ) (1 - e- T21 -) , (4) 

for r 2 i cm 1 gives the solution in Equation[TJ The assumption of low optical depth fails 
at 5 > 10 in fully neutral regions which is for example the case in minihalos, DM halos 
of masses < 10 7 M . 



For the SKA science case, the most important aspect will be the fluctuations in the signal (see 
also Figure [TJ since this is where the sensitivity and resolution of the instrument will bring the 
largest improvement over previous surveys. We can see from Equation [2] that fluctuations in 5T b 
originate from four different contributions: 

1 . fluctuations in the matter overdensity 5 

2. fluctuations in the hydrogen neutral fraction xhi 

3. fluctuations in the spin temperature Ts 

4. fluctuations in the line of sight velocity gradients. 
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3.1.1 Fluctuations from overdensity 



These are the most straightforward to calculate as they result from the growth of cosmic struc- 
tures. On comoving Mpc scales (which is the likely resolution of the SKA), the hydrogen 
density fluctuations are closely correlated to the Dark Matter fluctuations which only depend 
on the assumed cosmology and can be calculated using linear theory for most scales of inter- 
est. Baryon physics will be important mostly for the three other types of fluctuations. Density 
fluctuations dominate during the Dark Ages (at z > 30). They can also dominate at lower red- 
shifts, during the Cosmic Dawn era, if three criteria are met: the average ionization is still very 
small, the spin temperature is already globally coupled to the gas temperature (strong local Ly- 
a flux, see below) and the gas temperature is globally much higher than the CMB temperature 
so its fluctuations are damped. Such a regime may exist if there is a substantial population of 
X-ray sources during the Cosmic Dawn. Otherwise, overdensity fluctuations are mixed with 
fluctuations in both the spin temperature and neutral fraction. 



3.1.2 Fluctuations from neutral fraction 

Fluctuations in the neutral fraction are connected to the process of patchy reionization itself. 
The period in which these fluctuations are dominant is the one that we refer to as the Epoch 
of Reionization (EoR). The topology of these fluctuations depends on the nature of the sources 
(pervasive ionization for very hard X-ray sources, sharp fronts and bubbles for stellar type 
sources), their clustering properties, and the ionizing flux escaping into the IGM as a function 
of halo mass. Characterizing these fluctuations with respect to the source models is an important 
part of the scientific preparation of the SKA. Fluctuations in the neutral fraction dominate when 
the ionized regions are large enough to fill resolution elements of the telescope, provided the 
spin temperature is fully coupled to the gas temperature and the gas is heated to temperatures 
~ 10 x Tcmb- The spin temperature coupling is expected to occur early, well before any 
significant ionization. Substantial heating may happen early but does depend on the amount of 
X-rays produced (e.g. [PHtchard & Loeb[|201 1 ). 



3.1.3 Fluctuations from the spin temperature 

Fluctuations in the brightness temperature produced by fluctuations in the spin temperature are 
the less straightforward of the four. The local value of the spin temperature is the result of four 
competing processes (e.g. |Furlanetto et aL} |2006a| ): 

1 . coupling to the CMB temperature through absorption/re-emission of CMB photons 

2. coupling to the gas kinetic temperature through collisions 

3. coupling to the color temperature of the local radiation spectrum near the Ly-ct frequency 
through resonant scattering (Wouthuy sen-Field effect). 

4. coupling to the local brightness temperature in the vicinity of radio-loud sources. 



14 



The last effect occurs when near radio-loud sources the 2 1 cm photons from that source dominate 
over the 21cm photons from the CMB. In what follows we will not consider this localized effect. 
As a result Ts can be written as: 



ji-l _ ^CMB + x aT c 1 + X C T R 



S 



(5) 



Here T c is the color temperature of the Ly-a spectrum which is almost equal to T K in situations 
relevant for the EoR (see |Hirata[ |2006[ for details). The factor x c , the coupling coefficient 



though collisions, is non-negligible only in dense environments. The necessary densities for 
this are the average density for z > 30 or in a correspondingly overdense regions at lower 
redshifts. The coupling coefficient through Ly-a scattering, x a , is proportional to the local Ly- 
a flux, modulated by a back-reaction factor ( |Chuzhoy & Shapiro} 2006[ ). The local Ly-a flux is 
determined by the distribution and luminosity of sources of ultra-violet radiation but also by the 



global neutral hydrogen distribution (Semelin et al. , 2007 ). For more details on the computation 



of x a and x c see e.g. |Furlanetto et aL| ( |20 06a). 

Summarizing, for the epochs that can be studied by the SKA, fluctuations in the spin temper- 
ature are caused by fluctuations in the local kinetic temperature of the gas and fluctuations in 
the local Ly-a flux. Depending on the nature of the radiation sources, both the kinetic tem- 



perature and Ly-a flux can dominate the brightness power spectrum in the early EoR (Santos 
Bae k et aT| 2010 Pritchard & Loeb, 201 \) . The most likely scenario is that star 



etaU 2008 



forming galaxies produce sufficient ultra-violet photons to achieve complete Ly-a coupling 
quite early, perhaps even before z = 20, after which the spin temperature fluctuations are set 
by the gas temperature fluctuations. The latter is initially lower than the CMB temperature, 
leading to a strong global absorption signal, with fluctuations determined by adiabatic cooling. 
As X-ray sources start heating the neutral IGM, strong spin temperature fluctuations between 
cold and heated regions will appear, which will slowly disappear as the medium becomes more 
uniformly heated. 



3.1.4 Fluctuations from peculiar velocity 



Unlike the three former sources of fluctuation, fluctuations induced by local velocity gradients 
are statistically anisotropic since only the projection of the gradient along the line of sight has 
an effect on the brightness temperature. Even if, overall, these fluctuations are weaker than the 
others, their anisotropic behavior is unique, and in the linear approximation, their power spec- 



trum can be separated from the other sources of fluctuations, see Section 3.5 Moreover, at high 
z, in the linear regime, they are locally proportional to the fluctuations in the density field, so 
they can be used to probe the linear growth of structure ( |Barkana & Loeb[|2005a[ ), and constrain 
the cosmological parameters (Section [23]) . To derive high quality estimates of the cosmological 



parameters from observations with the SKA, however, it will be necessary to go beyond the sim- 
ple linear treatment (e.g. Mao et al. 2012[ ). Finally, we note that the bulk-flows as discussed in 
Section 2.2 can substantially enhance the observability of brightness-temperature fluctuations 
during the Cosmic Dawn ( |Visbal et a!} |2012} |McQuinn & Q'Leary[ |2012[ ) enhancing SKA's 
capability to study these higher-redshifts in greater detail than previously thought. 
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Figure 2: A three-dimensional view of the ion- 
ization field produced by a numerical simula- 
tion. Red and non-transparent is neutral, blue 
and transparent is ionized material. Tomogra- 
phy with the redshifted 21cm line should give 
us a similar view of the Universe (Figure cour- 
tesy of B. Semelin). 



3.2 Tomography and its analysis 

As explained above, during the EoR in a typical region of the IGM, the 21cm line is optically 
thin: once emitted, a photon is redshifted out of the line before it is re-absorbed. Therefore the 
redshifted 21cm signal carries information from the time and place where it was generated and 
thus it enables tomography of the signal. Even in the case where the thermal width of the line 
is set by a T K > 1000 K medium, we could theoretically image several 1000s of distinct planes 
between redshifts 6 and 30. In the case of the SKA, if we aim at a reasonable signal to noise 
(S/N) ratio, we will more likely be able to observe a few hundreds of planes. SKA precursors 
will not reach a sufficient S/N for tomography. 

To date most studies have focused on statistical quantities such as the power spectrum, which 
require far less sensitivity. What additional benefits are produced from analyzing the tomogra- 
phy? We can distinguish two approaches: 

• Characterizing individual objects 

In statistical diagnostics the local, real-space, information is lost. With tomography we 
should be able to identify individual features and interpret them. The simplest exam- 
ple is that of a single isotropic radiation source (a young galaxy, an intermediate mass 
black hole, etc..) which creates a distinct, roughly spherical pattern in the 21cm signal. 
If the average radial brightness temperature profile can be reconstructed, it can be com- 
pared to templates, and the properties of the source (luminosity and spectrum) can be 
inferred. There are, as yet, few works on the subject. Majumdar et al. ( 201 lb[ ) devised 



an anisotropic filter to detect individual ionised bubbles around bright quasars and study 
how the age and luminosity of the quasar can be constrained, see also section 3.8.1 |Von 



lanthen et al. ( 201 lj ) estimated the observability by the SKA of faint rings around ionizing 



sources created by the Wouthuy sen-Field effect of upper Lyman lines. They emphasize 
how both resolution and the ability to stack a large number of objects, and thus a large 
FoV, are crucial factors for this type of analysis. 
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As the prospects of actual observations with SKA come closer, it is likely that many more 
ideas will emerge on how to extract information on individual objects, and combine them 
into statistical properties. Resolution, FoV and sensitivity will be crucial quantities for 
the efficiency of this approach with FoV allowing to compensate for sensitivity to some 



degree through the process of stacking. See Section 3.8.1 for more information about 
combining 21cm data with other observables from individual bright sources. 

Studying the global topology of the signal 

We can also consider statistical or integral quantities that can be computed from tomo- 
graphic data only. While these will be computed from the brightness temperature, it 
should be easy to connect them to the ionization fraction as soon as the high spin temper- 
ature regime is reached. 

A first example is the bubble size distribution and its evolution with redshift (Iliev et al. 



2006} Zahn et aL) 2007 ; Friedrich et al. 201 1 ). This is a powerful tool to test the models 
against the future tomographic observations, putting constraints on quantities such as the 
luminosity distribution of the sources at a given redshift. 



Another option is to evaluate topological quantities such as the genus ( Ahn et al. 2010) or 



its close relative, the Euler characteristic (Friedrich et al. 2011 1. The evolution of these 



quantities with redshift can also put constraints on the source models. 

The tomographic exploitation of the redshifted 21cm data is only just beginning. Some of the 
above cited works take into account real data limitations such as the resolution by convolving 
with a simple beam shape. However, robust predictions will have to factor in effects such as the 
sky /detector noise, imperfect foreground substraction and the complex beam shape. 
Ideally we would like to be able to image the 21cm signal down to the resolution of the SKA 
core (arcminute scales). This should be feasible to rather low frequencies as the contrast be- 
tween ionized and neutral regions is about 20 - 30 mK (at z — 9). Tomography of the neutral 
density field is harder as the level of density fluctuations at V resolution are about 4-6 mK 
(rms values at z = 20 and 9). 



3.3 Power spectrum analysis 

Imaging is powerful, but requires a high S/N per spatial-frequency resolution element (i.e. 
voxel). Therefore a need exists for alternative statistical measures that compress many indi- 
vidually noisy modes into quantities that can measured with high S/N. At the highest redshifts, 
SKA only has the sensitivity to make images on the largest scales and will have to rely largely 
upon statistical measurements to measure small-scale structure. At lower redshifts statistical 
measures are still useful as they characterize the signal with relatively few parameters summa- 
rizing properties that are harder to quantify numerically from images alone. 
The main statistical measure is the power spectrum P(k, z), the Fourier transform of the two 
point correlation function in real space, defined by the relation 

(T 6 (k, z)T b (k', z)) = (27r) 3 c^(k - k')P(k, z), (6) 
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Figure 3: A sample of spherically averaged 21cm power spectra (k 3 P(k) / (2it 2 ) or A 2 (k)) of the 21cm 



brightness temperature for z > 11. From Santos et al. (2010 1. 



where k is a wavenumber, z is the redshift and 5^ is the three-dimensional Kronecker 6- 
function. The power spectrum is a natural quantity to measure from interferometric visibilities 
which themselves represent a Fourier transform of the sky signal. It would contain all the 
statistical information if the signal had a Gaussian distribution (as is almost the case for the 
primordial density field). The presence of ionized bubbles and heating by astrophysical sources 
produces non-Gaussianity in the signal, which requires the use of higher order statistics (see 



Section 3.4). However, even when large ionized regions introduce substantial non-Gaussianity 
in the 21cm sigmal, the power spectrum is still useful as it provides information on typical sizes 
of HII regions. 

The power spectrum is in general 3D, but it is common to consider the spherically averaged 
power spectrum P(k,z), with k = |k| or, if redshift space distortions are accounted for, we 
expect there to be a cylindrical symmetry so that we may write P(k±, k\\,z) where k± is the 



wavenumber in the transverse direction and ku along the line of sight (McQuinn et al. 2006) 



As the 21cm fluctuations evolve as a function of redshift, so does the power spectrum making 



it important to measure it at different redshifts (Pritchard & Loeb, 2008). 
During the Cosmic Dawn, when Ly-ct or X-ray backgrounds drive spin-temperature fluctua- 
tions, detailed power spectrum measurements would yield information about the relative emis- 
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Table 1 : Some values of wave numbers for different angular scales at different redshifts (using 
Vl A = 0.73, Vl m = 0.27, h = 0.7). 



z 


v (MHz) 


A6 (5 km) 


fcmax (cMpC^ 1 ) 


FoV (40m station) 


k min (cMpc^ 1 ) 


9 


142 


1.4' 


1.6 


3.0° 


1.3 x 10" 2 


14 


95 


2.2' 


0.93 


4.5° 


7.6 x 10~ 3 


19 


71 


2.9' 


0.67 


6.0° 


5.4 x 10~ 3 


24 


57 


3.6' 


0.52 


7.5° 


4.2 x lO- 3 


29 


47 


4.3' 


0.43 


9.0° 


4.2 x 10~ 3 



sion from galaxies and AGN ( |Santos et al.[ |2011[ ). Figure [3] shows examples of spherically 
averaged 21cm brightness power spectrum as a function of wavenumber in different redshift 
bins during the Cosmic Dawn. The shape of P(k, z) contains information about astrophysi- 
cal sources. The models suggest that it is dominated by the clustering of the radiation sources 



on large scales and by their radiation profile on intermedate scales (Barkana & Loeb 2005b 
Chuzhoy et alj [2006] [PrTtchard & Furlanetto[|2007l ). 



During reionization the overall shape of the power spectrum is determined by fluctuations in the 
neutral fraction. Simulations and theoretical work show that the key quantity that determines 
the shape of the power spectrum is the mean neutral fraction s H , almost independently of the 
redshift or details of the sources ( purlanetto et al.[ |2004| |Zahn et al.[ |2007t |Iliev et al.[ |2012| ). 
However, if one looks more closely, then the details of the ionizing photon sources and the 
statistics of dense neutral photon sinks modifies the shape of the power spectrum (McQuinn 



et al. 2007b I. It is this level of precision that SKA should be targeting. Figure [4] illustrates this 



point with the power spectrum for four different ionizing source prescriptions (all normalised to 
produce the same neutral fraction). Distinguishing between these different models will be the 
driver for power spectrum sensitivity. 

In practice, only a limited range of wavenumbers will be observable with sufficiently high signal 
to noise. The absolute smallest k is determined by the largest scale within one observation 
(FoV) and the absolute largest k by the resolution of the array. Table |3 .3] gives an overview of 
typical fc m in,ma X values for a given array configuration (see Section|5]for an extensive discussion 
on array configurations). At small wavenumbers, the loss of long wavelength modes along 
the line of sight from foreground removal is likely to limit power spectrum measurements to 
wavenumbers k > 0.01 cMpc" 1 . For larger wavenumbers, the increasing thermal noise due to 
sparse sampling of long baselines becomes a problem and is expected to limit SKA to scales 
k < 5cMpc _1 . Between these limiting scales it should be feasible to measure the power 
spectrum at high precision. 

Brightness temperature fluctuations from variations in the density, Ly-a flux, gas temperature, 
and neutral fraction evolve with redshift. This affects the power spectrum's shape and amplitude 
considerably. This overall evolution is captured in Figure [5j which shows the evolution of 
P(k, z) as a function of redshift for several wavenumbers. Three different regimes can be 
discerned where Ly-ct, temperature, and ionization fluctuations come to dominate the overall 
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Figure 4: A sample of spherically averaged 21cm power spectra (k 3 P(k)/(2ir 2 ) or A 2 (k)) for four 
different reionization models (SI: red solid curves, S2: green dashed curves, S3: black dot-dashed 
curves, S4: blue dotted curves). For the top panels, the mass weighted mean ionized fraction (x^m) is 
« 0.3, for the middle panels, « 0.6 and for the bottom panels, « 0.8. The error bars are the expected 
detector noise plus cosmic variance errors on the power spectrum for MWA (512 tiles), assuming 1000 
h of integration and a bandwidth of 6 MHz. In model S2, reionization is driven mostly by low mass 
sources, in model S4 high mass sources dominate the process. For more details about the source models 
and other aspects, see McQuinn et al. (2007b I, from which this figure was reproduced. 



signal. By combining power spectrum measurements at different redshifts these different phases 
might be identified ( |Pritchard & Loeb [ |2008[ ). Note in particular the large increase in power at 
z > 15 during the Cosmic Dawn, which should allow SKA to probe this epoch. A caveat here 
is the need to select the bandwidth for individual redshift bins so that power spectrum evolution 



is minimised across the bin (the so-called light cone effect, see Section 3.5 and McQuinn et al. 



2006,Dattaetal. 2011). 



Before the power spectrum can be interpreted it must be understood and this requires detailed 
theoretical modeling. So far, this has followed three parallel approaches each with strengths 
and weaknesses. Detailed numerical simulations take a dark matter N-body code and paint 
on a prescription for galaxy formation and radiative transfer to produce simulation volumes. 
These are typically numerically expensive and so restricted to either relatively small volumes 
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Figure 5: Evolution of four different fc-modes of the spherically averaged 21cm power spectra 
(k 3 P(k) /(2-7T 2 ) or A 2 (k)) with redshift. The lowest fc-mode clearly shows the three epochs of Ly-a 
fluctuations, heating flucuations and ionization fluctuations. From Santos et al. (2008 ). 



or small parts of parameter space, but are capable of high resolution giving insights into small 



scale structures (Mel lema et al.[ |2006[ |Baek et al.[ |2010| ). At the other end of the spectrum are 
analytic models, which give useful insights into the power spectrum, especially in terms of its 



dependence on different parameters, but which tend to be relatively simple (Furlanetto et al. 



2004[ |Pnt chard & Loeb[[2008| ). In between are semi-numerical simulations, which are in some 
sense specific realizations of analytic models, that are capable of simulating large volumes with 
an acceptable level of resolution (Mesinger et al. 2011} Santos et al.[|2010] ). Analysis of large 
data sets will likely require semi-numerical simulations that have been validated by comparison 
against fully numerical simulations, but are interpreted with reference to insights from analytic 
models. At this point in time no standard framework exists to interpret observed power spectra, 



but steps in the direction of such a framework have been taken, see e.g. (Lidz et al., 2008, Iliev 



etal. 2012). 



3.4 Higher order statistics 

Given the nature of the reionization process the expected signal is non-Gaussian, hence using 
higher order statistics to characterize the data can reveal information that the power spectrum 
does not include. The left hand panel of Figure [6] shows an example of the Probability Density 
Function (PDF) of the brightness temperature at four different redshifts; the PDF is clearly non- 
Gaussian in all four cases. Therefore, higher order moments, like the skewness, as a function of 
redshift could be a useful tool for signal extraction in the presence of realistic overall levels of 
foregrounds and noise. Har ker et aL] ( |200 9b ) (see also |Gleser et al.|2006| Ichika wa~et al.||2010 



21 



Iliev et al.|2012 ) showed that the skewness of the 21cm signal, under generic assumptions, has a 



very characteristic evolution pattern against redshift (the right hand panel of Figure[6]). At suffi- 
ciently high redshifts the signal is controlled by the cosmological density fluctuations which, in 
the linear regime, are Gaussian. At lower redshifts, and as nonlinearity becomes important, the 
signal starts getting a slightly positive skewness. As the ionization bubbles begin to show up the 
skewness veers towards zero until it crosses to the negative side when the weight of the ionized 
bubbles becomes more important than the high density outliers -note that high density outliers 
are likely to ionize first- but the distribution is still dominated by the density fluctuations. At 
lower redshifts the bubbles dominate the PDF and the neutral areas become the "new" outliers 
giving rise to a sharp positive peak to the skewness. As redshift 6 is approached the instrument 
noise, assumed to be Gaussian, dominates, driving the skewness again towards zero. To date 
PDFs have only been explored for the EoR and the case of high spin temperature, so that the 
fluctuations are only due to variations in the density and ionized fraction. How they behave 
when substantial variations in the spin temperature exist is unknown. SKA will be sensitive 
enough to quantify higher-order statistics, especially at higher redshift where tomography on 
small angular scales might still remain hard. 



3.5 Line of Sight effects / Redshift space distortion analysis 



The LOS velocity gradient introduces an inhomogeneity in the three-dimensional power spec- 
trum as the peculiar velocities shift the signal away from the cosmological redshift along the 
(LOS) frequency axis. If one can measure the power spectrum in terms of all three components 
of the wave vector k, one can characterize these redshift space distortions. Since only one direc- 
tion differs from the others one can fully characterize the behaviour of the power spectra using 
k, the length of the wave vector, and fi, the cosine of the angle between the line of sight and 
the wave vector k, or k\\ / k. When only retaining linear terms in the expansion of the full power 
spectrum in terms of the power spectra of neutral fraction (xri) and density (5) and assuming 
that T s T CM B one can show that the [i dependence can be written as 



P(k, m, z) = P M o (k, z) + /x 2 P M 2 (k, z) + h a P„a (k, z) 



(7) 



where the P^(k) = P$s, the matter power spectrum, see Bharadwaj & Ali (2004); Barkana & 
Loeb| ( |2005a| ). This conclusion also holds when one allows non-linear fluctuations in xhi ( [Mao 



et al. 2012). It is this decomposition that opens the road to measuring P S s directly from the 



redshifted 21cm measurements. 

However, the assumption of only linear variations is likely to be invalid through large parts of the 
EoR and in addition there is the additional LOS effect caused by evolution of ^hi and T s , the so- 
called light cone effect ( Barkana & Loeb[ 2006 ; Datta et al. , 201 ip . Both the non-linear and light 
cone effects have not been extensively theoretically explored yet, but are likely to become more 
important at the later stages of ionization. The observed power spectrum may further suffer 
from the Alcock-Paczynski(AP)-effect when the wrong cosmological parameters are used to 
map the angular and frequency coordinates to real space coordinates ( jNusser ^OOS ; Barkana 



2006 1. The AP-effect adds a /i 6 term to the /i-decomposed power spectrum of Equation |7| 
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Figure 6: Left panel: The distribution of 5T^ in a certain cosmological simulation of reionization from 
|Iliev et aL] (2008) at four different redshifts, showing how the PDF evolves as reionization proceeds. 
Note that the y-axis scale in the top two panels is different from that in the bottom two panels. The delta- 
function at 5Tb = grows throughout this period while the rest of the distribution retains a similar shape. 
The bar for the first bin in the bottom-right hand panel has been cut off; approximately 58 per cent of 
points are in the first bin at z = 7.78. Right panel: Skewness of the fitting residuals from data cubes with 
uncorrected noise, but in which the residual image has been denoised by smoothing at each frequency 
before calculating the skewness. The three lines correspond to results from three different simulations 
from Thomas et al. (2009 1 and Iliev et al. (2008 ). Each line has been smoothed with a moving average 
(boxcar) filter of nine points. The grey, shaded area shows the errors, estimated using 100 realizations of 



the noise. Both panels are reproduced from Harker et al. (2009b). 



The characterization of P(k, fi, z) is the first step in any analysis of all these LOS effects. The 



work of McQuinn et al. (2006} and Mao et al. (2008) considered the separation of the different 



P^ terms in Equation [7] How successful this separation is depends on the behaviour of the 
different P M terms, which for P^o and P^ depends on the details of the reionization process. 
The extent to which cosmological information can be extracted depends on the characteristics 
of the signal, which will be different for different phases of the CD/EoR. Different approaches 
have been proposed. 



No assumptions on astrophysics: In this approach only the P„4 term is used. Since this 



term is typically subdominant, this implies discarding much of the signal (M ao et aLj 
2012| ). Still, since this approach does not need any assumptions regarding the astrophys- 



ical processes it is the simplest and it can be used at all redshifts. Estimates show that it 



will be hard to extract cosmological information relying only on this effect (Mao et al. 
|2008| ). Additionally, the /i-decomposition may be affected by non-linearities in the ve- 
locity field, the effect of which has not been quantified yet (Mao et al., in preparation). 
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Simple Astrophysics: Another avenue would be to analyze the full P(k, n, z) at epochs 
where we expect the astrophysical contributions to be particularly simple. For example, if 
the radiative coupling and the heating of the neutral IGM is efficient so that Ts 3> Tcmb 
during most of the Cosmic Dawn (before any substantial reionization starts), then the 
21cm signal would be just proportional to the DM field (|Bowman et al.[ |2007|). The 



challenge will be to identify the epoch when this happens, which in principle could be 
done by looking at the redshift evolution of large scale modes or the rms of the signal 
( Santos et al.[ 201 1} ) as well as the global signal (Pritchard & Loeb 2008). This could 
be further confirmed by imaging a large patch where we do not find any obvious ionized 
regions, although there is always the danger of confusion with small fluctuations in the 
heating process or the ionization fraction. 

Negligible Astrophysics: An interesting approach, which needs to be further explored, 
is to look at very large scales (much larger than the typical "astrophysical scale", such 
as the size of ionized regions), so that we can use simple models for the astrophysical 
contribution. In this case we can in principle assume that the 21cm signal will be just a 
biased tracer of the underlying DM, thus making the cosmological analysis more straight- 



forward. This was used in |Joudaki et al.| ( |2011) where it was shown that an SKA type 
experiment can constrain primordial non-Gaussianity at a level comparable to Planck, 
thus providing a crucial test of inflationary cosmology. It should also be possible to probe 
the Baryon Acoustic Oscillations, thus providing a standard ruler at an interesting time in 
the Universe evolution. Note however that a major design driver for this will be a large 
FoV (i.e. ~ 5 deg). 

Modeling the Astrophysics: The last possibility and the one which has to be used when 
one wants to analyze the full P(k, fi, z) at epochs of substantial reionization, is to try to 
fully model the astrophysical components of the 21cm signal, e.g. the ionization fraction 
and spin temperature fields (see Equation [TJ), to constrain the contribution from the un- 
derlying DM density field. This approach needs a full model of all contributions to the 
21cm signal from either simple prescriptions of the ionization power spectrum and its 



cross -correlation with the density field (Mc Quinn et al.[ |2006| |Bowman et al.[ |2007[ ) or 
possibly from full simulations. 

The analysis of redshift space distortions to date has concentrated on the effect of patchy reion- 
ization. However, SKA observations may provide measurements from periods in which fluc- 
tuations in the spin temperature dominate ( |Santos & Cooray[ 2006[ ). This effect from peculiar 
velocities could then be used to separate the astrophysical contributions and provide extra in- 
formation on the nature of the first objects emitting radiation ( |Barkana & Loeb[ |2005a[ |Santos 
elaDIIOTTl). 



3.6 The 21cm forest 



An alternative to both the tomography technique from Section 3.2 and the power spectrum 



approach from Section 3.3 is to search for the 21cm forest, that is the 21cm absorption against 
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Figure 7: Upper panels: Spectrum of a radio source positioned at z = 10 (u ~ 129 MHz), with a 
power-law index a = 1.05 and a flux density J = 50 mJy. The red dotted lines refer to the instrinsic 
spectrum of the radio source, S- m ; the blue dashed lines to the simulated spectrum for 21cm absorption, 
<5abs (in a universe where neutral regions remain cold); and the black solid lines to the spectrum for 
21cm absorption as it would be seen by LOFAR for an observation time t- m t = 1000 h and a frequency 
resolution Av = 20 kHz. Left upper panel: £ a b s and £ b s without any smoothing. Middle upper panel: 
5 a bs and 5 Q b s after smoothing over 10 kHz. Right upper panel: and S'obs without smoothing and 
with l/10th of the LOFAR noise Lower panels: The ratio o" a b s /cobs corresponding to the upper panels. 



Taken from Ciardi et al. (2012 ). 



high-2 radio loud sources caused by the intervening cold neutral IGM and collapsed structures 
(e.g. |Carilli et al.|2002^[Furlanetto & Loeb|2002||Furlanetto|2006a]|Carilli et aL|2007t[XiTeTaT 



2009[|Mack & Wyithe|201 l[|Xu et al.|201 In fact the 21cm forest is more than a complement 



to tomography or power spectrum analysis. Since the strongest absorption features arise from 
small scale structures, the 21cm forest can probe the HI density power spectrum on small scales 
not amenable to measurements by any other means. 

The photons emitted by a radio loud source at redshift z s with frequencies v > h> 2 i cm _, will be 
removed from the source spectrum with a probability (1 — e _T21cm , see Equation [3]), absorbed 
by the neutral hydrogen present along the LOS at redshift z = z^icm/Kl + z s) ~ 1- Analo- 
gously to the case of the Ly-a forest, this could result in an average suppression of the source 
flux (produced by diffuse neutral hydrogen), as well as in a series of isolated absorption lines 
(produced by overdense clumps of neutral hydrogen), with the strongest absorption associated 
with high density, neutral and cold patches of gas. 

This suggests that the absorption features due to collapsed structures, such as starless minihalos 
or dwarf galaxies (Furlanetto & Loeb 2002, Meiksin[ 2011 Xu et al. , 2011 1 would be easier 
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to detect than those due to the diffuse neutral IGM. However, this does strongly depend on the 
feedback effects acting on such objects. Because of the large uncertainties in the nature and 
intensity of high- z feedback effects (for a review see |Ciardi & Ferrara 2005 and its updated 
version ArXiv:astro-ph/0409018), it is not straightforward to estimate the relative importance 
of the absorption signals from the diffuse IGM and from collapsed objects. 
While gas which has been (even only partially) ionized has a temperature of ~ 10 4 K, gas which 
has not been reached by ionizing photons has a temperature which can be as low as that of the 
CMB. This neutral gas can be heated by Ly-a or X-ray photons, thus reducing the optical depth 
to 21cm. While Ly-a heating is not extremely efficient, heating due to X-ray photons could 
easily suppress the otherwise present absorption features (e.g., Mac k & Wyithe[ 201 1[ Ciardi 



et al.[ 2012). This seems to suggest that observations of the 21cm forest would be possible to 



discriminate between different IGM reheating histories, in particular if a high energy component 
in the ionising spectrum was present. 

The most challenging aspect of the detection of a 21cm forest remains the existence of high-z 
radio loud sources. Although a QSO has been detected at z = 7.085 (Mortlock et al. 201 1 ), the 
existence of even higher redshift quasars is uncertain. The predicted number of radio sources 
which can be used for 21cm forest studies in the whole sky per unit redshift at z = 10 varies 
in the range 10 — 10 4 depending on the model adopted for the luminosity function of such 
sources and the instrumental characteristics (e.g. |Carilli et~aL 2002 ; Xu et al.| 2009), making 
such a detection an extremely challenging task. The possibility of using GRBs' afterglows has 
been suggested by Ioka & Meszaros (2005), concluding that it will be difficult to observe an 
absorption line, also with the SKA, except for very energetic sources, such as massive first stars. 
A similar calculation has been repeated more recently by |Toma et al. (201 1 ) for massive metal- 
free stars, finding that typically the flux at the same frequencies should be at least an order of 
magnitude higher than for a standard GRB. 

Figure 17] shows the 21cm absorption spectrum due to the diffuse IGM for a bright radio source at 
z = 10 (i.e. v ~ 129 MHz). The intrinsic radio source spectrum, S- m , is assumed to be similar 
to Cygnus A, with a power-law with index a = 1.05 and a flux density J = 50 mJy. The 
simulated absorption spectrum, Sab B , is calculated from a full 3D radiative transfer simulation 
of IGM reionization which resolves scales of ~ 15 kHz (Cia rdi et al.||2011| ). In this simulation 
all reionization and heating is done by stellar spectra, leaving the neutral IGM in its cold state. 
The observed spectrum, S^bs, is calculated assuming an observation time £ int = 1000 h with the 
LOFAR telescope and a bandwidth Av = 20 kHz. If the spectrum is smoothed over a scale 
s = 10 kHz (upper middle panel) or the noise is reduced by a factor of 0.1 similar to what is 
expected from the SKA (upper right panel) a clear absorption signal is observed. This is more 
evident in the lower panels of Figure [7J which show the quantity cr a bs/ Cobs* where Oi = Si — S- m 
and i=abs, obs. 

As explained above, absorption features due to small collapsed objects can be much stronger 
than those due to the diffuse neutral IGM. Since their cross-sections are small, the best con- 
ditions for detecting them would be when Ly-a coupling pushes the spin temperature in their 
lower density outskirts to the gas temperature before these regions have been affected by any 
heating (see figure 22 in Meiksin, 2011), conditions expected above z ~ 10. However, even 
after heating has started to suppress the 21cm absorption signal, some weak features due to 
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Figure 8: Evolution of the global 21cm signal from the Dark Ages to the end of reionization. Taken 
from |Pritchard & Loeb] ( |201 



collapsed structures may remain. Interestingly, even when it may not be possible to detect these 
weak features individually, they may be detected statistically through the excess brightness fluc- 
tuations they would produce over the telescope noise (see figure 29 in Meiksin, 201 



3.7 Global Signal 

Tomography, power spectrum and 21cm forest measurements all give us information about 
21cm fluctuations. Complementary to this would be measurements of the mean 21cm signal, 



referred to as the 'global signal' (Sh aver et al.[|1999t |Furlanetto[ |2006b| ). This requires an ab- 
solute measurement of the 21cm brightness and can be considered to be the 21cm equivalent 



of the COBE/FIRAS black body measurement (Mather et al. , 1990). In contrast, radio interfer- 
ometers typically measure brightness fluctuations in the same way as WMAP observations of 
CMB anisotropics. Figure [8] shows the expected features in the global 21cm signal (Pritchard 
& Loeb[ |2010[). At v ~ 15 MHz, during the dark ages, an absorption feature appears due to 



the collisional coupling to an unheated IGM that has been cooling adiabatically since recombi- 
nation. This first absorption feature is determined by fundamental physics alone, but these low 
frequencies are unlikely to be accessible from the ground ( Jester & Falcke[ 2009). The second 
absorption feature at v ~ 60 MHz occurs after star formation begins producing Ly-a photons, 
which couple spin and gas temperatures. Initially this leads to a deep absorption feature, but 
as sources of X-rays form and heat the IGM this absorption feature transitions into an emission 
feature. Progressive galaxy formation leads to ionizing UV photons that ionize the Universe 
and remove the 21cm signal altogether. 

From this one-dimensional spectral measurement a few key pieces of information could be ex- 
tracted. The positions of the various turning points would pin down the redshifts when the first 



galaxies and X-ray sources form and when reionization began and ended (Furlanetto 2006b 



Pritchard & Loeb, 2010). From this one could constrain the star formation rate, X-ray luminos- 
ity, and UV photon emissivity of early galaxies as a function of redshift. More detailed analysis 
of the 21cm signal might measure the thermal history of the IGM and so the presence of exotic 



heating sources (see Section 2.4). 
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Measuring the global signal can only be done using the auto-correlations of the SKA telescope, 
since a constant brightness temperature of HI provides no signal in the cross-correlation. We 
discuss this further in Section 5.3.1 in addition to its implications for SKA. 



3.8 Connecting with other observables and telescopes 

In this section we consider how combining SKA observations of the redshifted 21cm signal with 
other observations can teach us more about the Epoch of Reionization and the Cosmic Dawn. 



3.8.1 Individual QSOs 

The role of quasars during the EoR is a topic of debate. While they are generally believed to be 
important in heating the IGM ( |Zaroubi et al. , 2007, Baek et aL}|2010 ), it has been argued that 
the space density of quasars at high redshifts is too small to provide a significant contribution 
to the ionizing flux dLoebl |2009l |Schmidt et alj [1995] |Boyle et al.[ [20001 |Cristiani et alj [2004 



Gonzalez-Serrano et al. 2005). However, observations indicate that there may not be enough 



galaxies to fully ionize the universe (Bunker et al. , 2010), and it has been claimed that quasars 



must play an important role after all, at least at lower redshifts ( Volonteri & Gnedin 2009 Trac 
&Gnedin[[20TTT ). 

Apart from the need to understand the role of quasars in ionizing the IGM, there are many 
unanswered questions regarding their intrinsic properties at high-redshift. The observation of 
QSO's with a central black hole mass of ~ 10 9 M already at redshifts 6.5 < z < 7.0 (Fan 



et al.l 2006; Mortlock et al. 2011 1 raises questions about the formation and growth scenarios 



for supermassive black holes. 

With the SKA we will be able to follow up quasars found with optical and near-IR data and 
study many of the unanswered questions above. Among other things, we would be able to study 
the properties of these quasars including how many are active, their lifetime and to what extent 
they contribute to the ionization of the IGM by studying their HII regions. Obscured (e.g. type- 
II) quasars, however, are harder to find and might require X-ray surveys or detection with SKA 
itself (e.g. very- steep- spectrum radio sources tend to be at higher redshifts). 
The main near-IR surveys in the advent of the SKA are those with the VISTA telescope and 
with the future Euclid satellite. By extrapolating measurements of the luminosity function 
of quasars at redshift z ~ 6, Willott et al. ( |2010[ ) estimated number densities out to z = 9. 
Uncertainties, especially when it comes to the knee of the mass function, could imply smaller 
number densities for the wider surveys as there is a minimum amount of time required for these 
sources to assemble. 

However, from these extrapolations, one finds that VISTA related surveys such as VIKING 
(1500 sq deg to H = 19.9), VIDEO (15 sq deg to H = 23.7) and UltraVISTA (0.73 sq deg 
to 25.4) should all find several quasars at 6.5 < z < 7.5 and a few at 7.5 < z < 8.5. Euclid 
will have two surveys: a shallow (15000 sq deg to H = 24) and a deep survey (40 sq deg to 
H = 26). The wide survey should be able to detect around 1 quasar per 20 sq deg at z ~ 7, 1 
quasar per 50 sq deg at z ~ 8 and 1 quasar per 200 sq deg at z ~ 9. 
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Analysis of the spectra of optically detected high redshift QSOs reveal that large HII regions 
(10 -100 cMpc in radius Fan et al.} 2006; Carilli et al. 2010[ ) are associated with these objects. 
Targeted observations of HII regions around known luminous QSOs will provide unique and 



2008 



more detailed information about their size and shape ( |Wyithe et al.[ |2005} |Datta et ah 
Majumd ar et al.[ |2011a[ |Datta et aLj |2012[ ). The anisotropy in the HII region shape which 
may arise due the rapidly expanding ionization font (|Shapiro et al.[ 12006 ) and finite light travel 



time ( |Wyithe et aTj |2005j |Yu] |2005] |Majumdar eTaLj |2011a| ) can also be probed with SKA. 
This information can further be used to calculate the QSO luminosity and age with higher 
accuracy, providing crucial parameters for understanding the formation of SMBHs during the 
EoR (Majumdar et al. 2011b, Datta et aT| 2012). In addition, measurements of the contrast in 
21cm emission between HII regions and the surrounding regions will provide measurements of 
the hydrogen neutral fraction of the outside IGM ( Geil & Wyithe[ 2008[ ). These measurements 
will be complementary to power spectrum measurements. Identification of large HII regions in 
the SKA 21cm tomography can guide a search for bright QSOs and galaxies in the middle of 



these regions (Wyithe et ah] 2005, Datta et al. 2012) 



3.8.2 Galaxy surveys 

To alleviate some of the problems associated to observations of the weak 21cm signal, several 
cross -correlation analyses with observations in other frequency windows have been proposed. 
The idea is that the noise/systematics in two observations of different frequency and strategy 
might cancel out. An exciting possibility would be a cross-correlation with galaxy surveys 
(Lidz et al. 2009 , Wiersma et al. , 2012[ ). Even if SKA may have a high enough sensitivity not 
to need cross-correlation techniques in order to detect the signal, cross-correlating with other 
probes will improve our understanding of the process of reionization. In the case of galaxies, it 
will specifically help in answering the question which types of galaxies are mostly responsible 
for reionization. 



Following |Lidz et aL (2009) one can define the cross power spectrum between the 21cm emis- 
sion and the galaxies as: 



A!i, ff ai(*o/<m 



X 1 
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P,gal 



(8) 



where 5T b0 is the 21cm brightness temperature relative to the CMB for neutral gas at the mean 
density of the universe, xhi is the neutral fraction and b (k) is the dimensionless cross power 
spectrum between fields a and b. In order to construct the cross power spectrum, one therefore 
requires three fields, the density field, p, the neutral hydrogen field, xhi, and the galaxy field, 
gal, which can be obtained via numerical simulations of galaxy formation and the reionization 
process. 

It is found that the 21cm emission is initially correlated with galaxies on large scales, anti- 
correlated on medium, and uncorrelated on small scales. This picture quickly changes as reion- 
ization proceeds and the two fields become anti-correlated on large scales. It is apparent that 



29 




Figure 9: The circularly averaged, unnormalized 2D 21cm - galaxy cross power spectrum (A^ ga i(k); 
upper panel) and correlation coefficient (lower panel) for various redshifts/mean neutral fractions for a 
Ly-a Emitters survey. Taken from Wiersma et al. (|2012). 



these (anti-)correlations can be a powerful tool in indicating the topology of reionization and 
should be a important diagnostic tools for SKA observations. 

If the effect of observing and selecting real galaxies is taken into account, the result depends 
on the observational campaign considered. For example, for a drop-out technique (as in obser- 
vations of Lyman Break Galaxies), the normalization of the cross power spectrum seems to be 
the most powerful tool for probing reionization. In particular, it is quite sensitive to the ionized 
fraction as different reionization histories yield similar cross power spectra for a fixed ionized 
fraction. When instead a more precise measurement of the galaxy redshifts is available (as in 
Ly-a Emitters surveys) and so the three-dimensional position of the galaxy is known, much 
more information about the nature of reionization can be extracted, as both the shape and the 
normalization of the cross power spectrum provide useful information. In addition, the observ- 
ability of the Ly-a line from these galaxies is affected by neutral patches in the IGM and thus 



Ly-a Emitters surveys are particularly useful for EoR studies (McQuinn et al. 2007a; Jensen 
|et^|2TTT2l ) 

Figure [9] shows the 21cm - Ly-a Emitters cross power spectrum and correlation coefficient for 
a number of redshifts. Here the noise assigned to the 21cm survey is the one of the LOFAR 
telescope, while the Ly-a Emitters survey has the same characteristics of the one described in 
Ouchi et al. ( 2010[ ) with the Subaru telescope. The effect neutral patches in the IGM have on 



the observability of Ly-a Emitters is not included here. 
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3.8.3 Background radiation surveys 



Apart from surveys of individual objects described above, there are also background surveys 
which can be correlated with the 21cm signal. Of these the near-infrared background (NIRB), 
X-ray background (XRB) and the backgrounds from redshifted molecular lines, are all associ- 
ated with structures from the periods that SKA can study. The Cosmic Microwave Background 
(CMB) dates of course from the Epoch of Recombination, but the EoR is expected to leave its 
imprint on it. 

The cross-correlation of the 21cm signal with the XRB appears not have been studied in any 
detail and we do not discuss it here. 



Near Infrared Background 

In the near-infrared spectral region of 1 to 5 fim, the sky shows a faint excess emission 
of extragalactic origin. Although expected and searched for since at least the 1960s (see 
Hau ser & Dwek[ |2001[ for a review), it was first measured in this wavelength range in 



the year 2000 by a combination of IRTS and COBE data (Gorjian et al. 2000; Matsumoto 



et al. , 2000 [Wrig ht & Reese 2000| ). Such measurements are difficult due to the presence 
of the strong interplanetary zodaical light but seem to indicate a flux level of vl v ~ 10 
nW m~ 2 sr _1 above what is expected from the known galaxy population. Measurements 
of the fluctuation power spectrum at different wavelengths appear to be much less affected 
by zodaical light and yield fluctuation levels of « 0.1 nW mr 2 sr -1 (Kashlinsky et al 



|2003|[20T2l at 3.6, 4.5, 5, 8.8 /mi with Spitzer; Mats umoto et al.|2011| at 2.4, 3.1 /xm with 
AKARI; [Thompson et al.|2007[ at 1.1 and 1.6 fim withNICMSOS/HST). 

Theoretically, the most exciting interpretation of the NIRB is that it originates from the 
many small and faint galaxies at z > 6, some of which could still have massive metal-free 
(PopIII) stars (see e.g. |Santos et al.[ 2002| |Salvaterra & Ferrara[|2003| ). Small galaxies are 
thought to dominate the ionizing photon budget during the EoR, but are as yet undetected 
in the deepest current surveys. If true, the NIRB would be an exquisite tool to study high-;? 
galaxies and reionization as it would probe all sources rather than only the brightest ones. 
Models for example indicate that the power spectrum of fluctuations could distinguish 
between populations of galaxies with different clustering properties (see Figure [10] and 
Fernandez eTaLj |20T0] |20l2t |Cooray etaLj |20l2t [Yue et al. [[20121 ) . 

However, the measured intensity of the NIRB is found to be ~ 10 times larger than both 
theoretically and observationally can be accommodated for by stars during reionization 
( |Madau & Silk[ |2005| |Salvaterra & Ferrari] |2006[ ) and also the predictions for the ampli- 



tude of fluctuations are typically below the measured data points (|Cooray et al. 2012 ; Yue 



|et al.[ 2012). This indicates that our understanding of this background radiation remains 
incomplete and that in addition to the contribution from faint high-z galaxies, another yet 
unknown foreground must dominate the observed NIRB. The fact that this other com- 
ponent appears to have a clustering signal very similar to that of the EoR galaxies, is 
rather puzzling but may indicate that it arises from associated phenomena such as the 



gravitational energy release from quasar-like sources (Yue et al.[ 2012). 
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Figure 10: NIRB sky power spectrum signals calculated theoretically, based on different simulations 
of the galaxy distribution and reionization patchiness, compared to observational results at 1.6 fim from 



the NICMOS camera, 2.4 fim from AKARI, and 3.6 /um from IRAC, as labelled. From Fernandez et al 
(]20T2l. 



Cross -correlating NIRB fields with 21cm measurements could give a clear signal if the 
NIRB is dominated by faint galaxies from the EoR. One should however keep in mind 
that the NIRB signal comes from a wide range of redshifts and does not carry precise 
redshift information, which will weaken the correlation signal at any given frequency. 
A theoretical investigation of the expected correlation signal would be useful to make 
firmer statements on how strong a signal can be expected. Still, a NIRB-21cm cross- 
correlation study should in principle be able to show whether the NIRB mostly originates 
from the EoR, thus solving the enigma of its origin. If it does, it would allow us to make 
statements about the clustering of the faint galaxy population most likely responsible for 
the reionization process and thus improve our understanding of the physics of the EoR. 
The use of large fields (> 1°) is required to prevent the faint low- z sources present in the 
NIRB data to influence the result. 

• Cross-correlation with Intensity Mapping of Molecular and Fine Structure Lines 

The NIRB provides a signal integrated over many redshifts units. An interesting alter- 
native is the background caused by molecular and fine structure lines. These lines are 
generated in star forming galaxies and thus trace the star formation history. The two main 
species that have been considered are CO and CII. Even if experiments targetting CII or 
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Figure 11: The cross-correlation coefficient r(k) of the CII (with 10 m dish) and 21cm emission (with 
SKA) at z = 6, z = 7 and z = 8. The error bars of r are also shown (red solid), and the blue dashed 
ones are the contribution from the 21cm emission. We find the 21cm noise dominates the errors at z = 6 
and 7. In the top panel, the rcn,Hi(&) at z = 7 (green dashed) and z = 8 (cyan dotted) are also shown to 



illustrate the evolution of the ionized bubble size at these redshifts relative to z = 6. Taken from Gong 



etal. (20121 



CO lines do not have the necessary sensitivity and resolution to probe individual galaxies 
during this epoch, the brightness variations of the line intensity can be used to map the 



underlying distribution of galaxies and DM (Basu et al. 2004, Visbal & Loeb 2010) 



Recently it has been proposed to use rotational lines of CO molecules to probe reioniza- 
tion (e.g., |Gong et al.||2~01 1} |Cariflij|201 1} |Lidz et al.||201 1[ ) showing that a measurement 

The 



2011: 



of the cross-correlation should be achievable even for LOFAR (Gon g et al 
CII line on the other hand is generally the brightest emission line in star-forming galaxy 
spectra contributing to about 0.1% to 1% of the total far-infrared luminosity and will 
probe the onset of star formation and metal production in z ~ 6 - 8 galaxies. |Gong et aT 



(2012) analysed the possibility of intensity mapping using this line, showing that a cross- 



correlation with results from an SKA phase 1 experiment should generate a high signal 



to noise (Figure 11). Such a detection will provide statistical information on the typical 
bubble size (when the correlation changes from negative to positive) and thus probe the 
astrophysical processes ocurring inside the ionized bubbles at a statistical level. 



Cosmic Microwave Background 
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One of the leading sources of secondary anisotropics in the CMB is due to the scat- 
tering of CMB photons off free electrons (Zel dovich & Sunyaev[ 1969). The effect of 
anisotropics when induced by thermal motions of free electrons are called the thermal 
Sunyaev-Zel'dovich effect (tSZ) and when due to bulk motion of free electrons, the ki- 
netic Sunyaev-Zel'dovich effect (kSZ). The latter is far more dominant during reioniza- 



tion (for a review of secondary CMB anisotropics see, e.g. Aghanim et al. , 2008). 

The kSZ effect from a homogeneously ionized medium, i.e., with ionized fraction only 
a function of redshift, has been studied both analytically and numerically by a number 
of authors; the linear regime of this effect was first calculated by Suny aev & Zeldovich| 
( |1970| ) and subsequently revisited by Ostriker & Vishniac (1986]) and Vishniac| ( |1987| ) 
- hence also referred to as the Ostriker- Vishniac (OV) effect. In recent years various 
groups have calculated this effect in its non-linear regime using semi-analytical models 



and numerical simulations (Gnedin & Jaffe 2001 ; Santos et al. , 2003, Zhang et aLj 2004) 



These studies show that the contributions from non-linear effects are only important at 
small angular scales (/ > 1000), while the OV effect dominates at larger angular scales. 

The kSZ effect from patchy reionization was first estimated using simplified semi-analytical 
models by |Santos et al. ([2003) who concluded that it dominates over that of a homo- 
geneously ionized medium. More detailed modeling of the effect of patchy reioniza- 



tion were subsequently performed using numerical simulations ( Salvaterra et al. 2005 



Iliev et al.[ [20071 ) and semi-analytical models ( |McQuinn et~aL[ [2005] |Zahn et al.[ [2003 



|Mesinger et al. , 2012). Dore et al. (2007) used numerical simulations to derive the ex- 



pected CMB polarization signals due to EoR patchiness. The CMB bolometric arrays 
Atacama Cosmology Telescope (ACT, |Fowler et al.[ |2010[) and South Pole Telescope 



(SPT, Shirokoff et al. , 201 1 ) are currently being used to measure the CMB anisotropics 
at the scales relevant to reionization (3000 < t < 8000). The SPT results are starting to 



put limits on the duration of reionization ( |Zahn et aL[|2011[ ). 

Cross -correlation between the cosmological 21cm signal, as measured with SKA, and 
the secondary CMB anisotropics provide a potentially useful statistic. As in the cases 
described above, the cross-correlation has the advantage that the measured statistic is less 
sensitive to contaminants such as the foregrounds, systematics and noise in comparison 
to "auto-correlation" studies. 

Analytical cross-correlation studies between the CMB temperature anisotropics and the 
EoR signal on large scales (/ ~ 100) were carried out by | Alvarez et al.|(|2006[);|Adshead 



& Furlanetto] ( |2008| ); |Eee| ( |2009| ) and on small scales (I > 1000) by 



Cooray (2004); 



Salvaterra et"aL[ ( |2003[ ); [Slosar et al.| ( |2007) ; |Tashiro et al.| ( |2008l|20TT] ). Cross -correlation 



between the E- and B-modes of CMB polarization with the redshifted 21cm signal was 
done by |Tashiro et al.| ( |2008[ ); |Dvorkin et al.| ( |2009[ ). Numerical studies of the cross- 
correlation were carried out by |Salvaterra et al.| ( 2005[ ); |Jelic et al.| ( |2010b[ ). 

These studies showed that the kSZ and the redshifted 21cm signal: (i) anti-correlate on the 
scales corresponding to the typical size of ionized bubbles; and (ii) correlate on the larger 



scales, where the patchiness of the ionization bubbles are averaged out (see Fig. 12 ). The 
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Figure 12: An example of the cross-power spectrum of the kSZ and the cosmological 21cm signal at 
z = 11. The solid line is for a 'patchy' reionization history, while the dashed line is for a 'homogenous' 
history. Taken from Tashiro et al. ( |2011 1. 



significance of the anti-correlation signal depends on the reionization scenario (Salvaterra 
|etaL| [20TJ5] [Jelic et al.[ [20T0bt |Tashiro etaLl |2TjTfl ). 

Unfortunately, the cross-correlation signal turns out to be difficult to detect, even in rad 



ical reionization cases, assuming typical SKA and Planck characteristics ([Tashiro et al. 



2008 201 1 ). However, the kSZ signal induced during the EoR could possibly be detected 



in the power spectra of the CMB and used to place some additional constraints on this 
epoch in the history of our Universe. 
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4 Observational Challenges & Strategies 



In this section we outline some of the important issues to consider when designing an CD/EoR 
observational strategy and basic design reference for SKA-low. 

4.1 Fields size: sample variance and discovery space 

The optimal observational field sizes and locations will be determined by a number of com- 
peting requirements, both scientific and technical. While in general it would be desirable to 
maximize the size of fields in order to obtain better statistics, there is a trade off between the 
field size and the array design: (1) Large FoV requires small station sizes, which for a fixed col- 
lecting area means a large number of stations and hence a high computing cost for correlation. 
We therefore need to identify the minimum requirements for the SKA survey field to achieve 
the desired EoR science goals. (2) Furthermore, given a finite sky coverage, the question is 
where this coverage should be placed to gain maximum synergy with over surveys, while at the 
same time maximizing the conditions for good quality data. 

The main requirement for survey size is the desire to observe a representative sample of the 
Universe. This is important for minimizing sample variance (occasionally equated to cosmic 
variance in the literature) in statistical measurements of the power spectrum (see also Section[5]). 
Crudely the number of modes with wavenumber k that fit into a survey volume V is given 
by N k = 4:7iek 3 V/(2ii) 3 , for logarithmic bins of Ak = ek and the uncertainty in the power 
spectrum from the sample variance around redshift z ~ 10 is 

- 3 / 2 / y \ -V 2 / e \ -1/2 



Thus, assuming e = 0.5 a volume of 1 Gpc 3 is required to reduce sample variance to the 1% 

we 



level on scales ~ 0.1 Mpc -1 where the EoR signal is likely to be greatest. In Section 5.2.1 
further discuss the requirements - based on noise rather than sample variance, and also both 
combined - for power- spectrum determinations. 

A rough scaling relation for the comoving volume of a cylindrical survey, accurate to ~ 10% 
over the relevant redshift range for EoR observations (2^6 — 30), is 

l '— ' " al GPC '' (l^Hz) I (1 + 2)1/2 - 2 1 ' (10) 

We note that multi-beaming is not included in this volume calculation. The redshift dependence 
is therefore relatively weak, about a factor of three over the full redshift range, but is the main 
source of the fit error here. A field of view 5° across corresponds to a transverse comoving 
distance of 1 Gpc, while 10 MHz gives a line of sight comoving depth of ~0.2 Gpc. The take 
away point of this back of the envelope calculation is that fields 5° across are sufficient to allow 
for sample variance limited errors of ~ 3% on the scales of greatest interest for the 21 -cm 
power spectrum. To go to ~1% sample variance errors requires 10 such beams either through 



multi-beaming (see also Section 5.3) or sequentially. 
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Figure 13: Simulated maps of 21cm signal at two different redshifts. Each map is 425 cMpc/h on a side 
(corresponding to ~ 4°). The two images are drawn from the full cube at z = 7.5 (left, mean ionized 
fraction x^m = 0.5) and z = 6.8 (right, Xi t M = 0.8). The results have been smoothed with a Gaussian 
beam of 2' and a freqency bandwidth of 0.3 MHz. It can be seen that typical structures are captured at 
scales of 1-2°. Courtesy of G. Mellema and I. T. Iliev. 



A further important point is that although the line of sight direction can be well sampled to 
measure small-scale fluctuations, the largest scales we can extract are limited by the light cone 
effect (Section 3.5). Above which scale this becomes important depends on how fast structures 
evolve. Simulations suggest that in the worst case modes below k ~ 0.1 Mpc -1 are affected 
( Datta et aL} 201 1 1. The frequency direction is further restricted by foreground removal, which 
removes large scale modes. For the largest wavelength modes (small k values) all sensitivity 
thus has to come from the angular modes on the sky. 

Another important requirement is that the observational volume needs to be considerably larger 
than the characteristic scale of ionized regions during reionization and of heated regions during 
the cosmic dawn. Theoretical studies and simulations show that ionized bubbles have charac- 
teristic sizes in the range 1-20 cMpc during reionization, corresponding to angular scales from 
below an arcminute up to ~ 10 arcminutes. This is illustrated in Figure [131 which shows the 
ionization, density, 21 cm signal, and galaxy distributions through a slice of a numerical simula- 
tion. These panels are 1.2° across at z = 7.32 and give an indication of the structures that SKA 
would image. The larger patterns in ionized regions correspond to scales of ~ 100 cMpc which 
corresponds to angular scales of ~ 1° (see e.g. Zaroubi et al. , 2012| ). Again, there is a clear 



requirement for SKA fields that are several degrees across to provide a representative sampling 
of the ionized structures (see also Section 5.3). Large fields further maximize the possibility of 
serendipitous discovery of rare objects, for example, radio bright high redshift objects, within 
the main survey volume. 
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The field of view of the SKA affects imaging and power spectrum measurements differently. For 
power spectrum measurements it is important that an individual pointing contains all relevant 
k modes, so that the smallest k mode is fixed by the instantaneous field of view. Although 
it might be possible to produce a mosaic using multi-beaming, this is unlikely to accurately 
reconstruct modes with wavelengths longer than the size of an individual field. For imaging 
measurements, which operate in real space, different pointings could be used to stitch together 
a larger field. In imaging, we will typically be interested in counted statistics - the numbers and 
sizes of bubbles measured from individual images. In this sense, imaging studies have much 
in common with traditional galaxy surveys. As a consequence, the field of view requirements 
for imaging are likely to be less strict than for statistical measurements. Zaroubi et al. (2012) 
argue that since the density fluctuation power spectrum peaks at scales of 120/i -1 cMpc (~ 1°) 
this will be roughly the scale of the ionized and neutral regions at the midpoint of reionization. 
Reionization simulations of a 425 h~ l cMpc volume confirm this (Iliev & Mellema, private 
communication), see Figure 13 To properly capture this scale, an image size of at least ~ 2° 
should be aimed for. To summarize, a FoV ~ 2° across might be sufficient for imaging, but the 
larger FoV ~ 5° will be vital for power spectrum studies. We will discuss the implications for 
the design of SKA-low in Section [5] 

Having argued for the need for fields that are at least ~ 5° in size, we turn to the question of 
where these fields should be located on the sky. For EoR searches, the key consideration is to 
minimize Galactic radio foreground emission making fields at high Galactic latitude desirable. 
Figure 14 shows the radio sky with the regions observable from MWA and LOFAR sites. Being 
located at the same site, the SKA-low sky is the same as that for MWA. Hopefully results from 
MWA will help in identifying the best fields with minimal galactic foreground emission and 
polarization. 

Beyond minimizing foregrounds, it will be important to ensure the SKA fields overlap other 
astronomical surveys. By 2025, many different galaxy surveys will have surveyed > 10, 000 
deg 2 regions of the sky to differing depths in optical and NIR bands. While the majority of these 
surveys are targeted at galaxies z < 3 the availability of optical/IR photometry on SKA fields 
will be important for the identification of radio bright high-z quasars for 21 cm forest studies 
(Section H^]). Ground based surveys include BOSS (10,000 deg 2 - ~ 7500 Northern Galactic 
Cap (NGC) remainder SGC ( |Eisenstein et al.[ |2011| )) and VISTA. On a similar timescale to 
SKA, ESA will fly Euclid, which is perhaps the key survey instrument of comparable perfor- 
mance to SKA. Euclid will focus on areas |6| > 30° (see Laureijs et al. (2011 1, section 5.2.3), 
making this a desirable location for SKA fields. 

For direct correlation with SKA 21 -cm maps, galaxies at z > 6 are required and these are likely 
to be found as Lyman alpha emitters. The premier instrument for wide and deep field searches 
over the next decade will be the Hypersuprime Camera (HSC) on Subaru in Hawaii, which will 
see first light in 2012. HSC has a 90' diameter FoV with a preliminary survey suggested as a 
layer cake with 300 deg 2 shallow and 20 deg 2 deep components. Overlap with the HSC deep 
field would be critical for LAE-21cm cross -correlation studies (Section |3.8.2| ). 
In addition to all sky CMB surveys such as Planck there are a number of current small scale 



CMB experiments, such as SPT (Schaff eretaLj |20TT| ) and ACT (Du nkley et aTl|20TT| ), which 
are targeting the SZ signal in fields over ~ 1000 deg 2 . The possibility of detecting the kSZ-21 
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Figure 14: Brightness temperature of the radio sky at 150 MHz (from Landecker & Wielebinski 1970 1 
in Galactic coordinates. Contours are drawn at 180 (dashed), 270, 360, 540, 1100, 2200, 3300, 4400, 
and 5500 K. The north celestial pole area is cross-hatched. Heavy lines indicate constant declinations :- 
26.5°, +35°, and +54° with dots to mark 2h intervals of time. Star symbols indicate the coordinates 
of the four highest redshift (z > 6.2) SDSS quasars (found with the NASA Extragalactic Database, 



nedwww.ipac.caltech.edu). Taken from Furlanetto et al. (2006a) 
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cm cross -correlation has been discussed in Section 13.8.31 SPT and ACT are located at the South 
Pole and Chile respectively and so have fields accessible from the SKA site. Overlap of the 
SKA field with these CMB fields would allow for cross-correlation searches. 
The final choice of SKA observing fields will depend primarily upon the radio sky, but also upon 
the ability to maximize coverage by other wavelength surveys as mentioned above. Deep and 
wide follow up observations with targeted facilities at other wavelengths will also be useful, for 
example with ALMA, TMT/GMT/E-ELT, and JWST These instruments will all have coverage 
of the sky accessible to SKA from the southern hemisphere and so should place few constraints 
upon the choice of SKA field. 



4.2 Foregrounds 



In the frequency range of the CD/EoR experiments (say 40-240MHz; see Section 5.1 ) the fore 



ground emission of our Galaxy and extragalactic sources (e.g. radio galaxies and clusters) dom- 
inate the sky. The amplitude of this foreground emission is 4 — 5 orders of magnitude stronger 
than the expected cosmological 21 cm signal. However, given that the radio telescopes which 
are used for the EoR observations are interferometers and hence measure only fluctuations on 
given angular scales, the ratio between the measured foregrounds and the cosmological signal 
is reduced, typically to 2 — 3 orders of magnitude (e.g. |Bernardi et aL||2009[|2010[). 



In terms of physics, the foreground emission originates mostly from the interaction between rel- 
ativistic charged particles and magnetic fields, i.e. synchrotron radiation. Galactic synchrotron 
radiation is the most prominent foreground emission and contributes about 70% to the total 
emission at 150 MHz ( Shaver et al.[ 1999). The contribution from the extragalactic synchrotron 
radiation from mostly compact sources is ~ 27%, while the smallest contribution (~ 1%) is 
from Galactic free-free emission, i.e. thermal emission of ionized gas. 

The foreground emission is poorly constrained. The only all-sky map in the frequency range 



relevant for the EoR experiments is the 150 MHz map by Landecker & Wielebinski (1970), 
which a coarse, 5° resolution. The source counts from 3CRR catalog at 178 MHz (|Laing et"aL 



1983) and 6C survey at 151 MHz (Hales et al. 1988|) are too shallow for the deep EoR observa- 



tions. Hence, in the last decade there has been a slew of the observational and theoretical efforts 
to constrain and to explore the foreground emission. 

Observations with the Giant Meter Radio Telescope (GMRT) have characterized the visibility 
correlation function of the foregrounds ( Ali et al.||2008| ) and have set an upper limit to the dif- 
fuse polarized Galactic emission ( |Pen et aL , 2009). Rogers & Bowman ( 2008[ ) estimated the 
spectral index of the diffuse radio background between 100 and 200 MHz using the EDGES 
(Experiment to Detect the Global EOR Signature) antenna. The most recent and comprehen- 
sive targeted observations of the foregrounds have been done by the LOFAR-EoR team, using 



the Westerbork Radio Synthesis Telescope (WSRT; |Bernardi et aLj |2009| |2010| ) and the Low 
Frequency Array (LOFAR; Jelic et al., Labropoulos et al., and Yattawata et al., in preparation). 
These observations indicate that Galactic emission seems to be less prominent than expected by 
extrapolating from the higher frequency observations. 

Foreground models capable of simulating maps of the foreground emission on arc minute scales 
in the frequency range of the EoR experiments are diverse. |Jelic et aX] ([2008 ) made a first 
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Figure 15: Illustration of the various simulated Galactic and extragalactic foregrounds for the redshifted 
21 cm radiation from the EoR. Courtesy from V. Jelic. 



foreground model that includes both Galactic and extragalactic components of the foreground 



emission (see Figure [T3]). |de Oliveira-Costa et al.| ( [2008] ) used all publicly available total power 
radio surveys to obtain all-sky Galactic maps at the desired frequency range. More detailed sim- 
ulations of Galactic emission were developed by |Sun et"aL] ( |2008| ); |Waelkens et al . (2009); [Sun 
& Reich| ( |2009[ ); [Jelic et al.| ( |2010a| ), while maps of the extragalactic emission were developed 



by Jackson (2005); Wilman et al. (2008) 



4.2.1 Removal 

Once a well-calibrated data set cleaned of corrupting influences (i.e. interference, ionosphere, 
beam, etc.) has been obtained, the remaining major challenge will be the separation of the EoR 
signal from the astrophysical foregrounds. Foreground removal is generally considered to be 
a three stage process: bright source removal (e.g., |Di Matteo et iT| |2004^ |Pindor et al.[ |201 1 



BernardietaL||2011[ ), spectral fitting (e.g., |Shaver et al.[|1999l|Santos et aL||2005HWang et al 



2006J |McQuinn et alj |2006l [B owman et al.| [20061 IJelic et al.j [20081 jHarker et al.j |2009aj [20 1 



Petrovic & Oh[ |2011; T rott et al.[ |2012| ), and residual error subtraction ( |Morales & Hewitt[ 



2004) though efforts have been made to merge these steps ( |Gleser et al.[ |2008[ |Mao| |2012[ 
Petrovic &Qhl[20TTT ). 

The foreground fitting is usually done in total intensity along frequency, since: (i) the cos- 
mological 21 cm signal is essentially unpolarized and fluctuates along frequency; and (ii) the 
foregrounds are smooth along frequency in total intensity and might show fluctuations in polar- 
ized intensity (see Figure [16]). Thus, the EoR signal can be extracted by fitting out the smooth 



41 




Figure 16: Behaviour of the EoR signal and the foreground emission along the frequency direction 
(Thomas et al. 2009; Jelic et al. 2010a[ ). The foreground removal techniques are based on the smooth- 
ness of the foregrounds in total intensity. The polarized component is not expected to be smooth, which 
combined with polarization calibration errors may complicate foreground removal. Courtesy of V. Jelic 



component of the foregrounds along frequency direction. This can be achieved by using poly- 
nomials (e.g. , ISantos et al.[ |2005| |Wang et al] [2006] |McQuinn et al] [2006] |Bowman et alj [2006] 
Jelic et al] |2008 , and references therein), or more advanced non-parametric methods ( Harker 
et al. 2009a| ; Chap man et al.[ [2012 ). However, one should be careful in using polynomials. 
If the order of the polynomial is too small, the foregrounds will be under-fitted and the EoR 
signal could be dominated and corrupted by the fitting residuals. If the order of the polynomial 
is set too large, the EoR signal could be fitted out. Hence, in principle it is better to fit the fore- 
grounds non-parametric ally - allowing the data to determine their shape - rather than selecting 
some functional form in advance and then fitting its parameters (Harker et al. , 2009a |Chapman 



et al. 2012). In addition, fitting directly to the visibilities rather than the image cubes might be 



another avenue to remove foregrounds. 

All current EoR radio interferometry arrays have an instrumentally polarized response, which 
needs to be calibrated. If the calibration is imperfect, some part of this polarized emission is 
transferred into a total intensity and vice versa. As a result, such leaked polarized emission can 



mimic the cosmological signal and make its extraction more challenging (Jelic et al. 2010a 



Geil et aL] 201 1[ ). Although this could be a problem when analysing the intensity maps, no 
methods of foreground extraction have yet been implemented that take this effect into account. 
Future analysis including polarised data should establish how much this polarised leakage has 
to be controlled in order for proper foreground subtraction to be performed. 
The following are some requirements of the foreground properties required for the EoR fields: 

• high Galactic latitudes with low Galactic radio emission and polarization; 

• minimal Galactic or extragalactic emission on any scale; 



42 



• minimal power in the foreground structure at angular scales of 10' — 30'; 

• no complex bright radio sources within or near the edges of the field. 

A strategy to select those fields might come from a strategy where increasingly deeper, but 
smaller, fields are selected from starting all-sky shallow survey, thereby rapidly focussing on 
the best possible fields and creating a "wedding-cake" all sky survey at the same time that can 
act as global sky model for calibration and side-lobe leakage removal. 



4.3 Ionosphere 



One of the major distortions of the long wavelength radio-wave signal coming from cosmic 



sources is caused by the Earth's ionosphere and possibly even by the troposphere (Hewish 
1952| ). Ionized gas causes both changes in phase and in amplitude of the radio-wave 



1951 



These are directionally-dependent and can act differently on left/right-hand polarized waves 
due to Faraday rotation caused by the Earth's magnetic field. If not corrected for, especially 
at frequencies approaching the ionospheric plasma frequency around 5-10 MHz, the resulting 
image will be heavily distorted (see e.g. Cohen & RottgeringJ 2009 ) by an "ionospheric point- 
spread function" (e.g. Koopmansl 2010[ ). 

One can look at the effect of the ionosphere in the following way (see e.g. Ratcliffe, 1956): The 
sky can be described by a (infinite) set of points, each emitting a (Gaussian) random signal. The 
expectation value of the electric field squared is the source intensity and the random signal from 



different directions do not correlate (e.g Thompson et al. , 2001 1. Each point emits a spherical 
wave, which just above the earth's ionosphere/atmosphere can be assumed planar. The latter 
is distorted while traveling through the ionosphere. Under the first-order Born approximation^] 
one can approximate the plane- wave distortion by integrating over the index of refraction of the 
ionosphere via straight lines. 

It can be shown that the spatially-varying phase distortion in the direction of a point source 
is proportional to a slice, perpendicular to the line-of-sight to the source, through the three- 
dimensional Fourier transform of the ionospheric refractive index (Koopmans, 2010). For a 
wide field of view, these slices are tilted with respect to each other, causing their phase dis- 
tortions to become increasingly uncorrected over the field-of-view for a thick ionosphere (see 
e.g. Cohen & Rottgeringj 2009). The equivalence is the iso-planatic patch in adaptive optics 
(AO) over which a single bright source can be used for AO corrections. One can show that the 
three-dimensional nature of the ionosphere then becomes important. This can be seen because 
looking under large angles away from the phase-center one sees structure in the ionosphere as 
function of height, under an oblique angle. For a 2D phase screen this is not the case except for 
very gradual change in projected density. As a result, directionally dependent phase-solutions 
are necessary for low-frequency wide-field arrays such as SKA, but also for present-day low- 
frequency arrays such as PAPER, MWA, GMRT and LOFAR. 

5 The physical distance of a deviation of the wave-vector, as it travel through the ionosphere, from the straight 
line in case of the absence of the ionosphere, is smaller than the dominant scales that cause phase distortions of the 
plane wave. 
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Figure 17: A zoom-in on a LOFAR 150 MHz 
observation of the field around 3C196, show- 
ing what are presumably residual ionospheric 
effects around several brighter sources in the 
field. Note the correlation between the ef- 
fect over the relatively small distance be- 
tween the sources. These residuals are due 
to directionally-dependent ionospheric distor- 
tions on time-scales currently below the short- 
est solution interval in the calibration of the 
data. Taken from Labropoulos et al., in 
preparation 



Some of the issues that therefore need to be confronted by any deep EoR observation with SKA 
are listed below: 

• Signal-to-Noise: The ionosphere changes on time-scale of tens of seconds, which sets 
a maximum integration time beyond which visibilities start to de-correlate at some level 



(Bo urgois[ |1981[ ). Phase distortions correlate strongly between wavelengths, which can 



be used to compensate for short integration time. However, for a given bandwidth, the 
maximum integration time sets the S/N ratio of the images (or visibilities) and therefore 
the typical distance between brighter calibrator sources in the FoV. The lower the S/N 
ratio the further calibrator sources will be separated. If the distance between sources 
becomes too large, the EM phases of weaker sources in between them can partly de- 
correlate causing them to be smeared by "seeing" (see Figure [T7|). It is unclear at which 
level this seeing (i.e. blurring on arc minute scales) will show up and what effect it will 
have on the dynamic range of deep images with SKA. For example, currently a dynamic- 
range of about a million to one has been reached with LOFAR on fields with a bright 
(80 Jy) compact source (e.g. Lambropoulos et al. 2012 in prep.; Yatawatta et al. 2012 
in prep.) using solution time-scales of ~10 min. However, one still needs to go factor 
of ~10-100 times deeper to reach mK sensitivity levels with SKA. Whether ionospheric 
seeing will limit this ability requires further studies. The effect of seeing can however 
be further deviated by a larger collecting area of the array (i.e. calibrator sources can be 
fainter and thus closer together) and by the use of longer baselines (i.e. source confusion 
is limited in the modeling and ionospheric corrections). The question for SKA is therefore 
what signal-to-noise per integration time and bandwidth is required to enable ionospheric 
corrections over the entire image FoV to a level of ~ 1 mK per few arcmin beam. It 
should also be investigated whether these effects are smeared out sufficiently and can be 
subtracted in the foreground removal process (e.g. in optical ground-based images PSF- 
blurred point-sources can be removed without deconvolution as long as the noise is not 
speckle-noise but thermally dominated; the latter is unclear in radio-astronomy to the 
depth of current or future low-frequency arrays). 
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Three-dimensional Ionosphere: As mention above, the ionosphere is not well described 
by a 2D phase screen and requires 3D modeling at low frequencies a over wide fields 
of view. Deep observations e.g. with LOFAR (e.g. Labropoulos et al. 2012 in prep.; 
Yatawatta et al. 2012 in prep.) and the GMRT (e.g. |Cohen & Rottgering[ |2009| ) show 



strong directionally dependent image distortions below a few hundred MHz. This indi- 
cates that, over the wide FoV of these arrays, the 3D nature of the ionosphere cannot be 
neglected. A number of different approaches (or suggested approaches) have been taken 
thus far (e.g. Interna et al. 2009, Matejek & Morales} |2009[ ), among which are multi- 



ple phase screens, "rubber- sheet" models (which correct mostly for source motion, i.e. 
equivalent to tip-tilt corrections in the optical), interpolation of Jones matrices for cal- 
ibrator source and 3D electron-density modeling. It remains unclear which, if any, of 
these approaches is the best compromise between complexity, computational speed and 
being physically correct. In addition, the effect of Faraday rotation might be more serious 
than expected causing non-zero XY visibilities for unpolarized sources due to differential 
Faraday rotation on baselines as short of ten of kilometers as seen with LOFAR (A.G. de 
Bruyn, private communications). If not accounted for, this might cause artificial polariza- 
tion of the sky. All these, and possibly yet unknowns effects, need to be accounted for to 
a level of /iJy on baselines of a few km, where the EoR signal is expected to be seen with 
SKA-low. 

Long Baselines: Another important question that requires addressing is whether long 
baselines are required for SKA-low to correct for ionospheric effects on short baselines 
(where the EoR signal dominates). Ionospheric distortions should be smaller on shorter 
baselines because the ionosphere correlates much stronger over small physical distances. 
The outer scale of the ionosphere can be tens of kilometers and the inner scale meters (e.g. 
[Thompson et al.[ |2001} |Cohen & Rottgering"} [2009). In between these scales the power- 



spectrum is a steeply declining power-law. Hence most phase distortions are caused by 
large-scale structure (i.e. many kilometers). One notes however that the field-of-view of 
many current arrays is tens (few degrees) to hundreds of km (tens of degrees) projected 
on the ionosphere, incorporating many less correlated ionospheric structures and causing 
directionally dependent phase structure. 

However, on short baselines where these effects are smaller, the Galactic foregrounds are 
much brighter and source confusion is much larger. So small phase errors can be very hard 
to distinguish from a change in the sky model and small errors could cause leakage of the 
foreground emission into the EoR signal. Any such leakage at the level of fiiy per few arc 
minutes in the emission over a ~1 MHz bandwidth would be detrimental to the imaging 
of the EoR signal (e.g. Jelic et al] 2008 ). In this situation, the use of long baselines 



has many advantages: (i) the sky on long baselines is far simpler and consists mostly 
of compact easy-to-model sources, (ii) source confusion is far lower and on baselines of 
tens of kilometers can be below the EoR signal. Confusion noise can thus be avoided in 
the process of calibration, (iii) Compact bright source structure is far easier to determine 
using long baselines, allowing these sources to be subtracted from the shorter baselines 
without leaving residuals, (iv) Long baselines also allow sources to be seen along many 
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different angles through the ionosphere helping dramatically in the determination of the 
3D structure of the ionosphere. See Section [5 .2 .3 1 for more discussion as well. 



4.4 Radio Frequency Interference (RFI) 

The increasing demand for commercial usage of the electromagnetic spectrum makes it more 
difficult to carry out interference-free astronomical observations. The SKA will be larger with 
significantly more receiving elements and collecting area, and will thus be more sensitive than 
any existing radio interferometer. In addition, it will have large fields of view, large observa- 
tional bandwidths with high resolution and the capability to simultaneously observe in multiple 
directions ('multi beaming'). All this brings new challenges to the data processing, including 
RFI excision. 

Interference can occur due to a variety of reasons such as (but not limited to) sparking ignition 
systems, arcing sources, high-voltage power lines, satellite systems, the active Sun, malfunc- 
tioning receivers, incorrect observing parameters, communication systems, lightning, meteors, 
cars, trains and airplanes, etc. RFI has a complex time-frequency-polarization structure with a 
very high dynamic range in amplitude. Even though several signal processing methods are in 
use to counteract RFI, in practice there is no universal fool-proof technique. In view of this, RFI 
mitigation is achieved by using a combination of several engineering practices and techniques. 
RFI mitigation is generally carried out at three principal stages of astronomical data processing, 
namely real-time pre-detection and pre-correlation processing, real-time post-correlation pro- 
cessing, and off-line processing (Fridman & Baan 2001 ; Bell et alTj 2000). RFI mitigation is 



carried out at several stages starting from raw visibilities (or possibly even directly on the EM 
signal), calibration solutions where subtle errors may generally surface and be detected, and 
then on calibrated visibilities obtained after applying calibration. 

For the large data sizes generated by telescopes like SKA, a fully automatic computationally 
efficient scheme needs to be developed with the aim to achieve an effective, reliable and ac- 
curate RFI excision for SKA. Several examples of such automatic data-flagging systems exist 
(Offringa et al. , 2010, Pearson, 2002), but there are situations where manual intervention is still 



required for excising very subtle errors which escape automatic RFI excision. For SKA an ap- 
proach exploiting the natural strengths of signal processing techniques and judiciously applying 
them at various stages of data processing is an inevitable requirement. 

4.4.1 RFI environment and statistics 

The number of interference points in the data varies with the site where the telescope is located, 
apart from other factors like in-house generated interference. For example, in LOFAR data 
the typical amount of data affected by RFI is about 3 to 4% within the 120-240 MHz range. 
Typically the RFI detected is narrow band and has a bandwidth less than 2 KHz. 
A systematic study of detected RFI statistics for several days (distributed between 1994-1999) 
of astronomical observations at 151.5 MHz with the Mauritius Radio Telescope (MRT) revealed 
that the number of interference points falls monotonically with the strength of interference 



( |Golap et al.[|1998| ). This illustrates the important aspect that a still substantial low-level (close 
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to the thermal noise) RFI population can exist in the data. Such low level RFI may need to 
be detected and dealt with for sensitive experiments such as EoR studies. Although SKA-low 
will be located in a remote area, astronomical observations can still be expected to be affected 
by RFI, especially of low-level strength, also from satellites, reflections from meteorite trails, 
reflections from the ionosphere of ground-based transmitters, airplanes, etc. 



4.4.2 RFI mitigation for SKA 

The following issues may be relevant for RFI mitigation at SKA. Foremost we need to reiterate 
that even if a good automatic RFI mitigation system can be developed, it will require substantial 
computational efforts and may imply some loss in signal integrity. Furthermore the future of RF 
allocations is difficult to predict and as a result so is the RFI environment. For the cosmological 
21cm signal, it is important to study the effects and minimize any change in the statistical 
properties of the data which may be caused by the RFI mitigation system. 
It is beneficial to have SKA located in an as radio-quiet zone as possible, although results with 
LOFAR show that some RFI can be dealt with even in urban environments (Offringa[ 2012[ ). 



Narrow-band RFI excision is relatively easier. Studies with LOFAR have shown that flagging 
techniques can excise RFI from a large fraction of the EoR imaging frequency range with very 
low levels of data loss (typically 3-4%), because of LOFAR's high time and frequency resolution 
of around 1 s / 1 kHz. Compared to this, the typical data loss is about 10% for astronomical 
observations with the MRT mostly due to the poor spectral resolution of 1 MHz. Therefore, it 
is imperative that SKA will have high spectral and time resolution. 

The frequency ranges of FM stations (87-108 MHz) and DAB stations (180-230 MHz) are 
important for imaging the EoR, but it would be extremely difficult to use them (even after 
flagging) since transmitters might be seen continuously and occupy many spectral channels. 
Even in remote areas, the signals that are generated by the radio stations might need to be 
excised from the data to be able to image the EoR. Techniques currently in development that 
could suppress the transmitters, such as spatial filters and/or cyclo stationary filters, might suffice 
but have never been applied on such a scale. Further research in such methods is therefore 
pressing, and possibilities to extract the EoR signals from non-contiguous spectra needs to be 
further investigated as well. Furthermore, RFI should be an important consideration in deciding 
the number of ADC (analog to digital converter) bits required for SKA and how many bits are 
needed in digital processing stages after digitization. Even with a radio-quiet site, the signal 
path should remain linear under the presence of strong transmitters. Additionally, its band-pass 
filters should be designed to block strong interference, and attenuate out-of-band interference. 
After all, even the most radio-quiet sites will see satellites and air-traffic, and the future radio 



environment might look different (Boonstra et al. , 2009). 



Depending on the algorithm used, the order in which data is stored is a relevant aspect for the 
speed of RFI mitigation and the process of data reordering can dominate the computational 
costs. Given the amount of data to process, RFI mitigation processing not only has to be auto- 
matic but might have to be integrated with the calibration and imaging process so as to minimize 
input-output load. Since the order in which RFI mitigation, calibration and imaging require the 
data are usually different, it would be very useful to formulate schemes/algorithms where there 
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can be synchronization between the three processes for computational efficiency. Continuous 
monitoring should be carried out to generate valuable statistical data on RFI, which will be of 
additional help in combating it. 



4.5 Calibration 

The cosmological 21cm signal is weak compared to the foregrounds at all frequencies which 
are relevant for the study of the EoR and Cosmic Dawn, but more so at lower frequencies. 
Although it should be in principle possible to subtract these foregrounds, in order to do so, the 
signal should have a high degree of accuracy, or in other words, it needs to be calibrated to a 
level which allows the subtraction of foregrounds to the level that the cosmological 21cm signal 
can be detected. Calibration in this context thus is the correction of errors introduced by the 
propagation path and the instrument, as well as the removal of bright celestial sources from the 
data. After this first step, more specialized EoR specific data processing can be carried out (see 



Section 4.2). 



There are many aspects connected to calibration and here we only want to list a few important 
points. The major sources of errors in a typical EoR observation can be categorized as follows: 



Atmosphere: Ionosphere and troposphere (see Section 4.3 ) 



Receiver beam shape: The phased array beams to be used in SKA stations are formed 
by coherent combination of multiple receiver elements (dipoles). During a synthesis ob- 
servation, in order to track a given direction in the sky, the beam forming weights have 
to be varied. This will inevitably result in the variation of the beam shape over the full 
field of view (although it remains fairly constant along the tracking direction). More- 
over, the beam shape of each station typically differs somewhat from the others, due to 
different element layouts as well as effects such as mutual coupling. Images made us- 
ing such varying and heterogeneous beams will introduce distortions, particularly at the 
edges of the field of view. Apart from this, grating lobes could appear far away from the 
main beam beyond a certain frequency range, depending on the element configuration. 
Strong sources passing through grating lobes can act as sources of interference. The el- 
ement beam will have a strong polarization response, necessitating full polarimetric data 
models. 

Receiver signal path: A cascade of amplifiers and other signal processing units comprise 
the path connecting a single station with the correlator. The properties of such units vary 
and has to be corrected for, both before and after correlation. For instance, in LOFAR, a 
major source of error in the signal path are station clocks being slightly out of synchro- 
nization. The latter has now been fixed through the implementation of a common clock 
for the core area. SKA may want to follow a similar strategy. 

(Unmodeled/Imperfectly modeled) Celestial sources: A requirement to obtain satisfac- 
tory calibration is the (at least partial) knowledge of the sky being looked at. This model 
is iteratively updated during calibration and consequent imaging. In particular, compact 
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and bright foreground sources have to be modeled accurately. The model should not only 
have the intensity and polarization of each source, but also the shape information. In 
almost all cases some sources will have some structure above or around the resolution 
scale. Therefore, having high resolution data (with longer baselines) is crucial for such 
sources. Short baselines are also affected by emission from the Galactic plane. 

• Smearing: Due to limited capabilities to process (correlate/calibrate) as well as store data, 
some form of averaging has to be performed, both in time and in frequency. This distorts 
images, particularly those with a wide field of view. 

• Closure errors: Errors that cannot be decomposed as belonging to stations are called 
closure errors. There are various causes for closure errors. For instance, nonlinearities 
introduced at the receiver frontend (saturation, quantization errors) could cause closure 
errors. Furthermore, imperfect models of bright extended sources used in calibration 
would also introduce such errors. 

Ideally, calibration will be able to reduce all these errors to below the theoretical noise level. 
However, some of these errors are controllable and can be reduced by a careful system design 
(e.g., the clock synchronization) or handling of the data (e.g., smearing due to averaging). 



4.6 Selected results from SKA precursors and pathfinders 

Here we summarize some results from SKA precursors and pathfinders relevant for CD/EoR 
science. The relevant telescopes are WSRT, LOFAR, MWA, PAPER and GMRT. 

• WSRT: Observations with the WSRT-LFFE (Low Frequency Front End) served as a 
pathfinder for the LOFAR-EoR experiment and hence for SKA as well. The observed 
fields were centered at the quasar 3C196 (80 Jy in peak intensity at 150 MHz) and the 
North Celestial Pole (NCP, brightest source 5 Jy at 150 MHz). Both fields are well away 
from the galactic plane and are also target fields for the LOFAR- EoR observations. A dy- 
namic range of about 150,000 to 1 was reached in the 3C196 field (Bernardi et al. 2010[ ). 



The limitations were mainly due to source confusion and ionospheric variations. In the 
3C196 field, off-axis sources could be removed with an accuracy better than 1%. Polar- 
ization was calibrated in a direction independent fashion by solving for the off-diagonal 
elements of the Jones matrix. Given the equatorial mount of the WSRT, a single solution 
was usually sufficient to obtain a polarization accuracy at the 0.5% level throughout the 
whole 12h synthesis. Since there is no well-established polarized beam for WSRT at very 
low frequencies, instrumental polarization of points sources was corrected by fitting their 
response in the image plane, assuming that all their Stokes Q & U signals were due to 
instrumental polarization. This set the first limit on the power- spectrum of the Galactic 
foregrounds ( |Bernardi et aLj [2009| |20T0"1 ). 



LOFAR: During the commissioning phase, LOFAR regularly observed the same two 
fields as were previously observed with WSRT, namely 3C196 and NCP. The frequency 
range used was 115-165 MHz. With an effective integration time of about 10 hours, a 
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Figure 18: Deep observations of a small area in the 3C196 field at 150 MHz. Left panel: LOFAR, Right 
panel: WSRT. Taken from Labropolous et al., in preparation and Bernardi et al. ( |2010| ). 



noise level of about 100 /iJy (NCP) and a dynamic range of about 0.5 million (3C196) 
was reached. The main limitations currently are due to imperfect knowledge of the sky 
model, as well as ionospheric and beam shape errors. A noteworthy fact is that the noise 
level reached is within a factor of 1.5-2 of the theoretical limit. The increase in sensi- 
tivity compared to WSRT as well as more advanced calibration techniques (in particular 
directionally-depenend calibration) as well as the longer baselines out to ~ 100 km al- 
lowed for these substantially better results. 



MWA: The Murchison Widefield Array (MWA, |Lonsdale et al.[|2009| ) is currently under 
deployment. A 32 tile prototype has operated since 2010 and data taken with the 32T ar- 
ray are being processed. An early demonstration of the real-time calibration and imaging 



pipeline was obtained on a field centred on PictorA (Ord et al. 2010). This observa 



tion had an effective integreation time of 8 hours observations and 100 MHz frequency 
coverage. The full instrument deployment is expected during 2012. 



PAPER: The Precision Array to Probe the Epoch of Reionization ( Parsons et al.[ 2010[ ) is 



currently under deployment in South Africa. PAPER employs individual isolated dipoles 
and a novel calibration technique based on delay-delay filters ( Parsons & Backer] 2009). 
Given the limited collecting area, PAPER will employ re-configurability into a maximum 
redundancy configuration in order to achieve the deepest sensitivity on a limited range of 
k modes ( Parsons et al.[|20~l2~| ). Not employing any beam-forming, the observations are 
done in drift scan mode. It currently consists of 64 elements with the ambition to grow to 
128 or even 256 elements. 



GMRT: The GMRT reionization experiment is an ongoing effort to detect neutral hydro- 



gen 21 -cm signal statistically at 150 MHz. Ali et al. (2008 1 characterized the foregrounds 
on sub-degree angular scales at this frequency and find that the measured multi-frequency 



50 



angular power spectrum is roughly in agreement with the expected value (Datta et al. 
|2007[ ). They also found the foregrounds to be oscillatory in frequency. This measured 
oscillatory behavior can be reduced by suppressing the side-lobe response of the primary 
antenna elements. Ghosh et al. ( |2012| ) found that the suppression works best at the scales 
for which there is a dense sampling of the uv -plane. These authors also measured the 
fluctuations in the galactic diffuse emission at the 10' scale after removing bright radio 
sources. Pen et al. (2009) have carried out 150 MHz GMRT observations at a high Galac- 



tic latitude to place an upper limit of ^i 2 Ce/2n < 3K on the polarized foregrounds at 



I < 1000. Paciga et al. ( 2011[ ) placed a new upper limit on the 21 -cm power spectrum dur- 
ing the EoR. Removal of RFI, compensation for ionospheric disturbances, proper calibra- 
tion of radio sources etc., are among the major challenges that the GMRT faces currently 
though efforts are ongoing to overcome some of the hurdles ( |Roy et aL} |2010| |Prasad & 
|Chengalur| 



2012) 



4.7 Some lessons on foregrounds learned from current SKA pathfinders 

The early observations from the telescopes mentioned above are still quite far from their ulti- 
mate goal of detecting any EoR signal. However, the following aspects important for reaching 
a detection have already become clear. 

• The wide fields of view that are observed include thousands of celestial sources and at- 
mospheric/instrumental corruptions that significantly vary with direction, time and fre- 
quency. Therefore, an efficient and accurate calibration along different directions is es- 
sential. Until recently, (sequential/simultaneous) calibration approaches based on the 
concept of peeling were the only methods available for such calibration. However, re- 
cent developments have provided substantial improvements both in computational cost 
and accuracy ( |Yatawatta et al.| |200~8j |Kazemi et al.| |201 \\ . 



Subtraction of bright sources (down to the noise level; see e.g. Trott et al. 2012} ) re- 
quires construction of accurate source models. In a typical observation, a few complicated 
sources, and thousands of point, double, and triple sources can be seen. Accurate mod- 
els are needed not only for the complicated extended sources but also for the thousands 
of weaker compact sources. In particular, orthonormal basis functions such as shapelets 
( jYatawattaj |2010| ) and prolate spheroidal wave functions ( |Yatawatta| |2011| ) provide effi- 
cient ways of representing extended sources. In order to construct these accurate models, 
long baselines are essential. 

Although it is popular to quote a dynamic range reached in order to indicate the quality 
of the calibration process, it should be realized that this quantity is not the entirely correct 
criterion for measuring the quality of EoR observations (see e.g. Braun 2012| ). If the field 
contains a very bright source, chosen for calibration purposes, one can obtain a very high 
dynamic range without actually reaching the theoretical noise limit. On the other hand, 
in a field with only weak sources, the maximally achievable dynamic range is relatively 
low but the noise limit could be reached, provided proper calibration. At this point it is 
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hard to make firm statements on whether the presence of a bright source in the field is to 
be preferred over the absence of one, although LOFAR observations of the 3C196 (with 
bright source) and NCP (without a bright source) fields seem to suggest that in both cases 
the thermal noise can be reached. 

• For producing an accurate source model construction and calibration along multiple di- 
rections, it is important to achieve a high, preferably full, uv coverage. 

• Full polarimetric calibration is essential to handle the element beam polarization response 
as well as differential ionospheric Faraday rotation. 

• Finally, experience with LOFAR has shown that long baselines greatly benefit directionally- 
dependent calibration, modeling of the sky as well as ionospheric corrections. 

In the next section we further examine the basic requirements for an SKA-low that allow the 
science as outlined in the earlier sections to be accomplished. 
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5 Implications for SKA design 



Having presented (i) an overview of the science one might envisage doing with SKA-low, (ii) 
how one might want to do it and some of the observational aspects relevant for this, we now 
give an overview of what this implies for the design and lay-out of SKA-low. The critical issues 
that need to be considered for Cosmic Dawn/EoR science are the following: 

• Frequency Coverage: This sets the redshift range over which the HI signal from cosmic 
dawn and epoch of reionization can be observed. Current observational and theoretical 
constraints set this frequency range. 

• Antenna Distribution and uv-coverage: The distribution of antennas is important for 
the detection of the EoR signal on short baselines (less or equal to a few kilometers), cali- 
bration of the instrument, correction for ionospheric effects, reduction of confusion noise 
and determination of the structure of bright compact source and foreground removal, us- 
ing especially the longer baselines. In addition, instantaneous ut>-coverage sets limits on 
the number of antennas for a given collecting area and core area. 

• Field of View, Multi-beaming and Station Size: The FoV of the smallest array element 
for which visibilities are stored (e.g. dipole, tile, station) determines the largest scale for 
which information can be retrieved (e.g. through imaging or power spectra, etc.). The 
largest scale needed for CD/EoR science therefore sets the minimal FoV. Multi-beaming 
can increase the total FoV, but cannot recover fluctuations and structures on scales larger 
than the single beam FoV (without substantial computational cost). 

• Collection area or v4 eff /T sys : This sets the overall sensitivity of the array both for deep 
multi-epoch imaging (i.e. tomography) and instantaneous signal-to-noise for calibration 
purposes. In combination with the antenna distribution it also sets the sensitivity for 
power- spectrum measurements. 

We discuss each of these in more detail in the following sections. 

5.1 The Frequency Coverage 

In this section we present the optimal frequency range inferred from current knowledge about 
the CD and EoR^} The redshift/frequency range proposed here is well motivated, especially at 
low redshifts by observations of the Gunn-Petersson effect, and at higher redshifts through better 
understood physics and theoretical models that reproduce the current observational constraints 
at z £ 10. 

6 This section is based on a memo written by several of the co-authors (LVEK and BS) as part of the SKA 
Science Working Group to inform the SKA Project Office on the optimal frequency range(s) for high-redshift HI 
studies. 
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5.1.1 Upper and lower limits 



The upper limit on observations of neutral hydrogen during the CD/EoR is set by the time 
when the Universe becomes (nearly) fully ionized again at z low = (z^iAv — 1), where u 2 i = 
1420 MHz. Below z\ ow only a small fraction of the Universe remains neutral, residing mostly 
in galaxies. Studying this residual neutral HI constitutes a different science case (e.g. galaxy 
evolution, baryon acoustic oscillations, etc) that will not be addressed in this white paper. It 
is generally accepted that reionization is completed around z up ~ 5 — 6 based on the Gunn- 
Peterson absorption as observed in high-z quasars ( |Fan et al7[ 2006 ). Conservatively a rise in the 
optical depth for HI absorption and dark gaps, presumably due to patches of neutral hydrogen, 
are starting to appear around z = 5.6 which correspond to z/ up = 215 MHz (z = 5.6). This 
is generally accepted to be the end of the EoR, although some neutral patches might remain at 
lower redshifts at a level of as much at ~ 10% at z ~ 5 (Mesinger, 2010). One might argue 
that this is the transition phase between the EoR and the phase in which galaxies as we see 
them today start to emerge and evolve over time. An upper frequency limit of z/ up = 215 MHz, 
however, seems to be the best estimate at present for when the EoR ended. 
A stringent lower limit in principle does not exist, because hydrogen is mostly neutral after the 
recombination era (z ~ 1100), although it is not expected to be observable for all redshifts 
below that. A lower limit is therefore given by when the first redshifted 21 -cm signals are de- 
tectable. As explained in Section. [3T] this requires the spin temperature of the neutral-hydrogen 
gas to be different from the CMB temperature, which requires either the presence of Ly-ct pho- 
tons or high densities. Below z ~ 30 extended regions of high density become very rare. When 
sufficient Ly-a photons appear depends on early star formation and black hole growth, which is 
not really known and motivates the high redshift SKA observations of the Cosmic Dawn in the 
first place. Based on the (nominal) theoretical models from |Pritchard & Loeb| ( |200"8| ) as shown 
in Figures 19 and 20, one infers a lower limit on the start of 21 -cm absorption due to the first 
stars and itermediate mass black holes of u\ ow = 54 MHz, which corresponds to z = 2fQ We 
note however that this value depends on the star forming efficiency and the amount of radiation 
escaping the (proto-)galaxies in three spectral bands: Lyman band, ionizing UV and X-rays. 
These parameter are somewhat constrained by the existing observational data, but there is still 
large uncertainty. Figure [20] shows predictions for the evolution of the brightness temperature 
power spectrum from both theoretical models ( Pritchard & Loebj 2008j ) and radiative transfer 
numerical simulations ( |Baek et aL 2010 1. One can check that the shape and amplitude are simi- 
lar in both approaches, but that the redshift where features appear differ. Semi-numerical models 
( Santos et al.[ 2008| ) show the same pattern, with general agreement on the shape and amplitude, 
but different redshifts for the features. The redshift discrepancy is mainly due to the uncertainty 
in the star forming efficiency and the limited resolution of the simulations. Moreover, the recent 
discovery of a coupling between large and small scale modes in the the dark-matter distribution 
during recombination, causing bulk velocity flows in the HI gas (Tseliakhovich & Hirata 2010), 
predicts a substantially increase the strength of 21-cm brightness temperature fluctuations (fac- 
tors 2-3) up to redshift as much as z ~ 40 ( |Visbal et at] |2~0T2f |McQuinn & O'Leary^ [2012] ). 



7 We note that these redshifts are not as precisely determined as quoted here from either observations or theory, 
but we would like to be precise in corresponding redshifts and frequencies. 
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Consequently, if z = 25 is a reasonable upper limit encompassing most the nominal models 
above, a wider frequency range should be allowed for than the current 70-MHz lower limit, 
although possibly not designed for (e.g. by using a flexible set of frequency filters). This would 
allow the study of these new and exciting physical processes. Indeed, counter of expectations, 
LOFAR has taught us that interferometric imaging can be done down to frequencies ~20 MHz 
(see e.g. |van Weeren et al.[ [20121). 



5.1.2 Extreme Range and Optimal Frequency 



An extreme lower limit - excluding exotic physics - is when Ly— a emissivity (and star- 
formation) is extremely strong early on (by a factor of around ~100 up from nominal) and is 
stronger than X-ray heating. In that case, 21 -cm absorption due to Wouthuy sen-Field coupling 
could start as early as z ps 35 ( Pritchard & Loebj 201 1[ ) or at v ~ 40 MHz, possibly lead- 
ing to further enhanced and observable 21 -cm brightness temperature fluctuations due to bulk 
flows at these redshift ( |Visbal et aL}|2012[|McQuinn & O'Leary[|2012| ). Similarly, as argued by 
Mesinger](|2010[), hydrogen could remain partly (~ 10%) neutral till z fa 5 or v ~ 240 MHz. 



A more encompassing range, larger than 4:1, would therefore be 40-240 MHz (i.e. 6:1). One 
might strongly argue that this range should be allowed for by SKA, but not optimized for, since 
it covers most conceived and exotic EoR scenarios. 

The sensitivity of an antenna or a collection of antennae ('station') is normally expressed as 
A eS /T sys or in short A/T which stands for the effective collecting area over the system (noise) 
temperature. The dominant contribution to T sys at low frequencies is the sky the value of 
which decreases quickly with increasing frequency. For dipoles collected in stations A e g re- 
mains roughly constant below the optimal frequency z/ opt and above it decreases roughly as 
oc [y I ^ op t)~ 2 , as the station becomes sparse. Therefore, A/T will typically be optimal near 
or above z/ opt . A precise choice of the optimal frequency also impacts the collecting area per 
dollar. Choosing z/ opt too low will lead to considerably more collecting area per dollar at low 
frequencies but less at high frequencies. Given the frequency range of 54-215 MHz chosen as 
being optimal and also a broad "peak" in the brightness temperature fluctuations as seen in the 
right panel of Figure 19, an optimal frequency in the middle of this range i.e. u opt = 108 MHz 
would be bestjf] 



5.1.3 Full Frequency Coverage 

It is concluded that 54-215 MHz (i.e. a 4:1 range) is the most favorable range to cover the 
SKA-1 and 2 EoR science case for redshifts z = 5.6 — 25, with an optimal frequency at 

8 We note that this assumes that the collecting area (or A e g per antenna) that can be purchased per dollar is a 
function of v opt . We suspect, however, that it will not be a strong function and that A e g goes up with ^ opt going 
down, for fixed costs, because dipoles do not linearly grow in cost with their effective collecting area. Hence, one 
might consider the extreme case where dipole size is not a cost factor. In that case A e g grows with (u/u op t)~ 2 and 
offsets the loss in collecting area due to sparseness at frequencies greater than v op %. In that case, choosing a very 
low value of v opt makes more sense, but could be limited by other factors such as land use, etc. A final choice of 
f op t should therefore factor this cost in. 
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Figure 19: Top panel: Evolution of the CMB temperature Tcmb (dotted curve),the gas kinetic temper- 
ature Tk (dashed curve), and the spin temperature Ts (solid curve). Middle panel: Evolution of the gas 
fraction in ionized regions X{ (solid curve) and the ionized fraction outside these regions (due to diffuse 
X- rays) x e (dotted curve). Bottom panel: Evolution of mean 21 cm brightness temperature Tt,. In each 
panel we plot curves for model A (thin curves), model B (medium curves), and model C (thick curves). 
From |Pritchard & Loeb| ( [2008l ). 



^opt = 108 MHz (z = 12, but see footnote [8]). The most narrow (3.5:1) range would be 54- 
190 MHz. Despite these "most optimal" but narrow ranges which in principle can be observed 
with a single dipole receiver system with an efficiency exceeding 0.1, one might strongly argue 
(in situations where possible) for these limits to be 'soft', allowing observations - albeit with 
limited sensitivity - over a wider range of 40 - 240 MHz (z m 5 — 35) to cover less likely and 
more exotic scenarios for the start and finish of the EoR, and potentially therefore consider a 
dual-band receiver system such as LOFAR for the lower and higher frequencies in the range 40 
to 240 MHz.. 



5.2 Antenna distribution, Sensitivity and Collecting Area 

Which antenna distribution for SKA-low would be optimal for reionization and cosmic dawn 
studies is still a matter of debate. In this section will draw some general conclusions which 
depend on optimizing both power spectrum and tomography measurements. Different SKA 
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Figure 20: Left: Evolution of power spectrum fluctuations based on theoretical modeling. The different 
curves show P(k, z) as a function of z at fixed k for k = 0.01, 0.1, 1, 10 Mpc -1 . Diagonal lines show 
eTfg(^), the foreground temperature reduced by a factor e ranging from 10~ 3 - 10~ 9 to indicate the level 
of foreground removal required to detect the signal. Adapted from (Pritchard & Loeb 2011 1. Right: 
Evolution of the brightness temperature power spectrum with redshift, based on numerical simulations. 
k = 0.07 h/Mpc (thin black), k = 0.19 h/Mpc (thick black ), k = 1.00 h/Mpc (thin gray) and k = 3.15 
h/Mpc (thick gray). From model SI to S7 the X-ray contribution is increasing. From |Baek et al. (2010). 



precursors (e.g. MWA, PAPER, LOFAR) follow different strategies, even though they all focus 
on power spectra determination. The differences are mostly in the ratio between longer and 
shorter baselines, collecting area and 'core area', the area over which the shorter baselines are 
distributed. 

In determining the antenna distribution, one should account for 

• How well can the science goals be achieved (i.e. power spectra, tomography, etc). This 
mostly involves maximizing the ability to detect the signal of interest over a wide range 
in redshifts and angular scales. 

• How well can (i) RFI be excised, (ii) the instrument be calibrated for instrumental and 
ionospheric effects and (iii) foreground contaminants be removed (see Sect. [4]). These 
questions focus more on observational strategies and on biases in the modeling, leakage, 
mode-mixing, covariance, etc. 

Any observation has a set of unknown parameters that need to be solved for: (1) the sky model 
as function of direction and frequency and (2) the instrumental/ionospheric model. Solving 
both without introducing artifacts in the resulting sky model (which includes the HI signal), that 
ultimately could prohibit the science to be carried out, is of utmost importance in CD/EoR ob- 
servations, especially because the signal is expected below the noise in all current experiments. 
For SKA the S/N ratio is expected to exceed unity for scales larger than a few arcminutes, at 
least for the higher frequencies. In this case biases should play a less important role. However, 
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for power spectrum analyses at lower frequencies where some modes will have S/N<1, bias 
can still be very important. 

Hence, whereas most studies and thinking thus far focused solely on how well the power spectra 



of redshifted 21cm intensity fluctuations can be measured (e.g. |Bowman et aL||2006||McQuinn 



et al. 2006[) this should not be taken as the only guidance for an array design. Whereas this is 



indeed the goal of all current SKA precursors, in the era of SKA itself, tomography (i.e. direct 
imaging of neutral hydrogen structures) is at least as important as measuring power spectra. 
Tomography comes with its own requirements which we will discuss below. 
An example of this is the following: the most prominent features during the late stages of 
reionization are the ionized bubbles, which are a few to tens of Mpc in size; their positions 



correlate on scales of ~ 120/i 1 Mpc (Zaroubi et al. , 2012), i.e. of order a degree at redshifts 



around z ~ 10. These bubbles have a contrast of ~ 30 mK (see Equation [2]) between their 
fully-ionized inner region and the surrounding neutral hydrogen. What is important to map 
out these tomographic features, is not a baseline distribution that maximizes the sensitivity for 
power spectra measurements on that angular scale by making the array as compact as possible 
and hence placing many visibilities inside one uv -resolution element and through which phase 
information is discarded, but to have baselines that have instantaneous sensitivity on both the 
spatial and frequency scales of these bubbles (arc minutes, but also degrees!) and at the position 
of these bubbles in the images. 

Whereas a very compact array might be well-suited to measure power spectra for a range of 
/c-modes, using both their parallel (sky) and perpendicular (frequency) components, it will be 
ill-suited to image small-scale (arcminute) bubbles in the hydrogen distribution. The latter 
requires more baselines on scales of several kilometers rather than several hundred meters (see 
e.g. Zaroubi et al.[|2012~| ). Long(er) baselines are also very useful for foreground subtraction and 



calibration. They 'see' a simpler sky foreground consisting predominantly of relatively compact 
sources. These sources constitute a major contaminant whose effects have to be removed from 
the shorter baselines (where the CD/EoR HI signal predominantly is found) in order to reduce 
confusion noise and strong model degeneracies. The ability to resolve compact sources also 
allows for much easier calibration and three-dimensional ionospheric tomography as discussed 
in Section 031 

It is therefore critically important not only to optimize for power spectrum measurement, but 
also for tomography, and for the ability to calibrate the instrument and remove foregrounds. All 
this suggests that long(er) baselines are very useful, if not critical. 

5.2.1 Power spectra measurements 

Before considering the requirements for tomography and calibration, we first address those 
related to power spectrum measurements. To measure the power spectrum of the redshifted 21- 
cm signal, baselines should be placed at liv-points that correspond to the A;-modes of interest. 
However, for measuring the three-dimensional power spectrum one should add to these the 
modes from the (Fourier-transformed) frequency domain. 

Based on the derivation in McQuinn et al.| ( 2006[ ), and verified numerically, one can show 



that for a constant density of visibilities in the wt>-plane, the noise error on k 3 P(k)/2n that 
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dominates over cosmic variance in most instances (except for SKA itself where the S/N exceed 
unity per mode), can be written as^j 




(ID 

This equation assumes that one integrates over Ak = ek with e = 1 (i.e. one dex in /c-scale; for 
other values of e the above equation scales as 1/ y/e). It is extremely useful because its overall 
scaling relations hold very well and can easily explain the difference between different arrays 
(see discussion below). We note that f^FoV — A 2 /A eff is the FoV of the smallest beam-formed 
receiver element, which sets the area of the sky that can be observed in one single pointing. 
The distances D c and AD C are the comoving distances to the redshifts where the frequency is 
centered and the comoving distance corresponding to a bandwidth B at that comoving distance. 
Hence the factor within the square root is the observed comoving volume. Because the error 
on the power spectrum decreases as the square-root of number of independent A;-modes - a 
number proportional to k~ 3 l 2 - and because A^ oisc scales as oc k 3 P N (k), the overall scaling of 
the noise error on the power spectrum is k 3 ! 2 . In addition, the term (T sys / Bt int ) 2 is the variance 
of the total power of a single receiver element for a bandwidth B and total integration time 
t int . Finally, the noise error scales with the factor (A COTe A e s / Al oll ) , where A COTC is the core area 
in which the receiver elements are distributed, A e g is the effective collecting area per receiver 
element and A co n is the total collecting area of the array (i.e. the number of stations iV stat times 
A eS ). Even though this equation is only valid for a perfectly uniform density of ww-points, its 
scalings are correct. Redistributing the receivers will only tend to tilt the dependence on k. 



Equation 1 1 highlights a number of points: 



First, we find that power spectrum sensitivity is much better, in an absolute sense, for 
small k-modes. Although this would naively imply that more compact arrays are better, it 
also implies that different A;-modes are emphasized in that case. In general cosmological 
information is mostly contained in the smaller-A; modes (i.e. large scales, peaking around 
1 degree), whereas the larger-A; modes mostly probe the astrophysics of reionization and 
the cosmic dawn (see Sections[2]and[3]). Comparing arrays based on sensitivity at different 
fc-modes is therefore comparing sensitivity to cosmology versus that to the astrophysics 
of reionization. One should compare arrays only for a fixed k mode and then evaluate 
how well the same scientific questions can be answered. 

Second, the noise error decreases when more cosmic volume is probed. This can be 
seen from the first two terms. The error scales with the square root of the number of 
independent A;-modes within a range of Ak (in the above equation Ak = k). Hence the 
/c-volume scales as k 3 , but the comoving volume V scales with l^ov oc l/Atf. Since the 
size of an independent element scales as 1 /V, one finds that the error scales as \/V. At 



This equation is not given in this form in McQuinn et al.| ( |2006 1, but has been derived by LVEK using the 



same method as outlined there. In this form is gives the important scaling relations with array parameters useful to 
understand power-spectrum measurements. 
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the same time, the number of independent modes increases with l/A cS which allows the 



noise variance to scale as A eS as shown in the last term of Equation 1 1 Combining the 
two terms, one gains in sensitivity by \A4eff- 

Third, shrinking the core area (A coie ) substantially increases the sensitivity of the array 
(i.e. decreases the error), but at the cost of loosing the longer baselines and sensitivity 
for larger A;-modes, if the collecting area of the array remains fixed. Shrinking the array 
can partly offset the loss in sensitivity when its total collecting area (A co yi) decreases, 
but it also shifts sensitivity to lower /c-modes, where the effects of the cosmic dawn and 
reionization are far less obvious. Shrinking an array to compensate for loss in collecting 
area is therefore not a cure to make up for a loss in sensitivity, because it shifts the focus 
of the science (i.e. from CD/EoR to cosmology). 



Fourth, the last factor in Equation 11 can be explained more intuitively as follows: as a 
trick we multiply it with (A c s/A c s). We then see that (A e s/A co n) 2 = N^ t , where iVstat 
is the number of stations inside the core. The remain factor (A core /A e s) is the number of 
independent modes in the uv -plane that are covered by all visibilities. Since the number of 
visibilities is ~ iV s 2 tat /2, the last factor in the above equation is nothing else than half the 
number of visibilities per uw-resolution element, (n 1 ^ 1 ). Combining the last two factors, 
we see that it represents the noise variance per uv -cell after an integration time £ int and 
using a bandwidth B. This factor also allows a simple scaling from image noise to power 
spectrum noise, because it enters in the instantaneous noise error per image resolution 
element. 



In summary, Equation 1 1 has two main contributing components: (i) the first two (apart from 



2/tt) factors indicate the inverse of the square root of the number independent A;-modes and (ii) 
the last two factors provide the noise variance per uv -cell. We note that this is very similar to 



what was found in Morales & Hewitt (2004) but provides a somewhat more intuitive picture. 



Keeping k, T sys , B and £ int the same, for different array configurations, we find the following 
scaling relations for the important array parameters: 



A Noise (X 



^■core \fA. 
^coll 



cff 



A r 



A c 



OC 



^statArfT 



OC 



V^stat^con , 



(12) 



Equation 12 show that A co n and the A COTC are the two critical parameters, because they have 



the most impact on A^ oise . It is better to first set A e s to the optimal choice in terms of field of 
view and costs, and then vary iV stat until the required A co n = iV stat x A cS is reached for power 
spectrum and/or tomographic requirements. We come back to this when discussing the required 



field of view, i.e. a maximum on A cS (see Sect 5.3 ). 



Although collecting area is the most critical factor, it can partly be compensated for - for a fixed 
but measureable A;-range - by making the array more compact and splitting the collecting area in 
smaller stations or receiver elements. The latter increases the number of visibilities and lowers 
the thermal noise per w-cell, but also increases the number of required correlations by a large 
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Table 2: Parameters used for the SKA precursors/pathfinders and different SKA configurations 
to derive sensitivities for power spectrum measurements. 



Telescope 




Distribution 


As (m 2 ) 


-Rcorc (m) 


-Rmax (m) 


MWA 


112 


R- 2 


14.5 


20 


750 


PAPER 


128 


constant 


7.1 


X 


150 


LOFAR 


48 


R- 2 


804 


150 


1500 


LOFAR-AARTFAAC 


288 


constant 


25 


X 


150 


SKA 


50; 150; 450 


R- 2 


lOV^ant 


500 


2000; 5000 



factor, increasing the correlator and processing costs substantially. Two illustrative examples to 
improve the S/N by a factor of two are the following: 

(a) Decreasing A^ oisc by a factor two requires a four times smaller A cS for a fixed collecting area 
and core size, and thus generates 16 times more visibilities, requiring a 16-fold more powerful 
correlator as well as substantially more computing power to process and store these data. 

(b) In contrast, the same factor of two requires only a factor 2 2 / 3 ps 1.6 more collecting area 
(see Equation [12]). This might increase the price per station a little, but would most likely be 
cheaper than the required correlator and processing costs and probably not affect the overall cost 
of running the array by very much. To collect the same sky coverage then requires 4 x 2 2 / 3 ~ 6 
beams, at substantially less correlator costs. 

It is clear from the above arguments that the best approach to keep computational requirements 
within limits is to increase A c r (per station) to a level that still probes all k modes of scientific 
interest within its field of view and to ensure sufficiently good wv-coverage. This most rapidly 
decreases the noise error on the power spectrum, if station size is not the largest cost-driver. 
Although a balance might have to be found, it is probably far cheaper to build more collecting 
area per station and create more station-beams, rather than increase the correlator capacity by an 
enormous amount to build the same sky from many more visibilities. The former approach also 
lends itself better to a staged build-out from SKA phase 1 to phase 2, because the processing 
capacity for multi-beaming could be added later as processing power increases and becomes 
cheaper. 

Hence in summary: A simple scaling relation shows that improving power spectra measure- 
ments benefits far more from a modest increase in collecting area rather than a split of the array 
in more stations for a fixed collecting area. The station size is also limited by arguments based 



on the required field-of-view and m;-coverage (see Section 5.3 ) 



SKA compared to its precursors/pathfinders PAPER, MWA, LOFAR Figure |2T] shows 
the results of a more precise numerical array-sensitivity calculation based on the formalism 
in |McQuinn et al.| ( f2006[ >. We use the latest numbers in the literature for PAPER, MWA and 



LOFAR and compare the results to different array configurations for SKA. Table |5.2T lists the 



parameters used. The formalism in McQuinn et al. (2006) reproduces Equation \TT\ exactly for 
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Figure 21: Comparison of current arrays, PAPER, MWA and LOFAR, with SKA, assuming B=10MHz, 
tint = 1000 hrs and AA; = k. For the existing arrays we assumed the latest published (or inferred) 
specifications, see Table 2. The black line indicates the expected power spectrum of the 21cm power 
signal. 



the same assumptions and the same scaling relations. To properly compare the different arrays, 
we take k — 0.1 cMpc -1 as the reference point where to compare sensitivities. 



PAPER and MWA: We find that the current array-configurations of PAPER and MWA perform 
equally well, even though PAPER has a smaller collecting area (A co n) than MWA and a similar 
number of stations. The lower collecting area of PAPER is compensated by making the array 
even more compact than MWA, hence lowering A COTC . Equation 12 shows that this improves 
the power spectrum sensitivity of the array. In addition, PAPER gains sensitivity by having a 
somewhat smaller A eS , since only single dipoles are used rather than tiles. Overall this results in 
PAPER and MWA having similar sensitivities to the power spectrum. Both PAPER and MWA 
are able to probe only the smallest k modes, because of their compact configurations. We note 



62 



however that the expected HI power spectrum drops quite rapidly below k=0.l cMpc 1 (see 



Figure 21 ), which mostly offsets the gain in sensitivity. These low fc-values also predominantly 



probe cosmology, rather than the evolution of ionized bubbles (e.g. |Zaroubi et al. |2012| ). Hence 



shrinking the array helps more than splitting the array if the collecting area is kept fixed, but 
also shifts the science focus of the array from CD/EoR to cosmology. 

LOFAR and LOFAR-AARTFAAC: LOFAR at the same k value has an order of magnitude 



better sensitivity, as can be seen in Figure 21 This is because LOFAR's collecting area exceeds 
that of MWA by a factor of ~10, which yields a factor ~30 in sensitivity. This more than offsets 
the factor 2.5 decrease in the number of stations, which increases AN oise by a factor ~1.5. We 
note that the original MWA design had a four times larger collecting area which made it more 
equivalent to LOFAR in terms of power spectrum measurements. At lower k values we note that 
cosmic variance flattens the curves a little making the difference mostly depend on the power 
spectrum itself and the number of modes being probed. In that case more compact arrays with 
larger beam-sizes will gain, but not enough to offset the difference. The beam-size of LOFAR 
is relatively small, and hence the largest scales or smallest k modes (k < 0.01 cMpc -1 , not 



shown in Figure 21 ) cannot be measured well. However, the expected drop in the 21 -cm power 
spectrum on these scales makes these modes inaccessible even for SKA. Probing those very 
large scales is therefore only possible for SKA if the array is split in extremely small receiver 
elements, hugely increasing the required correlator and processing capabilities but potentially 



making it impossible to calibrate the array (e.g. Braun, 2012). 



We have also calculated the power spectrum sensitivity of the AARTFAAC system (e.g. Prasad 



& Wijnholds[|20l2| ). This addition to LOFAR allows all 288 tiles/dipoles of the superterp station 
in the inner 300 m core to be cross-correlated and is currently being implemented. AARTFAAC 
can improve the performance of LOFAR on scales k < 0.2 cMpc -1 by a factor of five if it can 
be properly calibrated. We note that such a hybrid array could be considered for SKA as well 
to boost its power- spectrum capabilities. 



SKA: Finally, Figures |2T] and [22] show the sensitivity of SKA itself, varying A co u, iV sta t and 
also A cme (by varying the core radius r max ) and the distribution of the visibilities. Whereas 
the latter seems to have little impact, we note two things: (i) one gains about a factor of ~3 in 
sensitivity by going from 50 to 450 stations, as expected for a fixed A;-mode. (ii) Small values 
of k can be probed only for smaller station sizes as expected, but as previously mentioned this 
requires an increased correlator capacity, (iii) Below k < 0.1 cMpc -1 , curves of similar beam 
size (i.e. number of stations) but different core areas converge, suggesting that on larger scales 
the noise error is negligible and sample variance dominates at redshifts below 10; above that 
redshift the sky is much brighter and the noise error dominates also on large scales. None of 
the current arrays is in that regime, (iv) Over the full range, a more compact array (i.e. two 
versus five km radius) performs better. Building a more extended core (> 2 km) is therefore not 
required for power spectrum analysis, but could help in tomography, as explained in the next 
subsection. Longer baselines of course remain important for sky modeling, calibration, etc. 
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Figure 22: Comparison different array configurations for SKA, assuming B=10MHz, i; nt = 1000 firs 
and AA; = k. Collecting area, number of stations and antenna distribution are varied. The blue line 
indicates the expected power spectrum of the 21cm signal. 

5.2.2 Tomography/imaging 

When considering power spectrum measurement, we saw that the collecting area, core area, and 
station sizes have varying levels of impact on improving the power spectrum sensitivity. 
Tomography is helped most by increasing the total collecting area on a given angular scale, since 
increasing the number of modes (by increasing the beam size) does not help. These structures 
cannot be imaged (except on very large scales) by using massive redundancy of baselines as 
is done with extremely compact arrays that only focus on the 21cm power- spectrum detection. 
Such compact arrays are incapable of imaging structures on scales of a few arcminutes, which 
requires at least baselines of a few kilometers. Obtaining substantial sensitivity on those base- 
lines therefore requires spreading stations/receivers over a wider area, which goes somewhat 
against the requirement for power spectrum measurement. 

To illustrate the required tomographic sensitivity for SKA-1&2, we use two different criteria 
both set by the science requirement discussed in Sect J3] of this white paper: 

• Tomography to a 5T h = 1 mK level on a few arcmin scales, required to map out hydrogen 
brightness temperature fluctuations in the IGM over cosmic time during the cosmic dawn 
and epoch of reionization. 

• Tomography to 5T h = 10 mK level over many arcmin scales to map out ionized bubbles 
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Figure 23: Shown are the A/T requirements for SKA to reach 1 mK (left) or 10 mK (right), respectively, 
in 1000 hrs of integration and using a BW of 1 MHz/matched (solid/dashed declining lines). Top/botom 
rows show arrays with a core diameter of 2 and 5 km, respectively. The rising-flat solid lines indicate 
an SKA with a 1km 2 physical collecting area and an optimal frequency of 100 MHz. If the A/T re- 
quirement fall below this line for angular scales of 2, 5, 10, 20 and 60 arc minutes (top to bottom), then 
tomography at that angular scale can be done in 1000 hr and with a BW=1 MHz to that level rms. 



during the Epoch of Reionization. 

The first criterium is already a requirement in the Design Mission Reference (DRM) of SKA 
and is set by the expected HI brightness temperature features during the Cosmic Dawn and 



Epoch of Reionization inside neutral patches (see Section 3.2). During the later phases of the 
EoR, neutral hydrogen is being progressively ionized in bubbles/patches around star forming 
galaxies. Because the hydrogen total intensity signal is ~30mK, these patches have a much 



larger contrast than HI fluctuations inside neutral patches (see e.g. Zaroubi et al. , 2012, for a 
discussion). In fact, they will be observed as 'holes' in the sky by an interferometer. 
Again we assume an array with a physical collecting area of 1 km 2 , an integration time of 
1000 hrs for either a fixed bandwidth of 1 MHz or a bandwidth that matches the spatial resolu- 
tion. We assume baseline distribution of R^ 1 inside a 2 or 5 km diameter core. This distribution 
roughly matches the baseline distribution that we assumed for the power spectrum analysis, 
over an order of magnitude (from \u\ ~ 10 2 — 10 3 ). The station size is not relevant as long as 
the scales of interest are below the size of the station beam. In addition, we assume an opti- 
mal frequency above which the array becomes sparse (oc (t'opt/V) 2 ) for u opt = 100 MHz and 
T sys = 100 + 300 x (150/z/) 2 - 55 K. 
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Because the EoR centers around redshifts z ~ 10 ± 4, we see from Figure [23] (right panels) that 
such features could be imaged at ^3cr level on scales of a few arcminutes. If we ask what SKA 
can do in terms of tomography at the 1 mK level, to which current instruments hope to get with 
power spectra, one needs to look at Figure [23] (left panels). Scales larger than 5' can be reached 
by an SKA with baselines up to 5 km down to around 50 MHz. At this resolution one can reach 
1 mK rms in 1000 hrs at frequencies above 140 MHz, whereas at resolutions of 10', 20' and 
60' this is reached at frequencies of 85, 60 and below 50 MHz. Hence we can map brightness 
temperature of HI to at least 1 mK on degree scales over all frequencies and redshifts. Bubbles 
that have a much higher contrast and probably only occur at lower redshift (higher frequencies) 
can be imaged on scales of ~ 2' as well, assuming they have 30 mK contrast. At high redshifts 
large scales dominate and imaging at the tens of arcminute scales can be reached by SKA for 
the nominal numbers given above. 



5.2.3 Longer Baselines 

Whereas EoR/CD science will be mostly restricted to short (few-km long) baselines, experience 
gained with e.g. LOFAR shows that longer baseline are extremely useful and potentially critical 
to maximize our ability to calibrate the instrument and ionosphere and remove foregrounds 
(especially compact sources). The deepest images at frequencies corresponding to redshifted 
21-cm emission from LOFAR reach in all three cases a level of ~0. 1-0.2 mJy rms continuum 
noise over a bandwidth of 48 MHz on baselines out to several tens of kilometers. To reach this 
level, directionally-dependent calibration using the longer-baseline data for compact sources 
was crucial. Reaching this depth, i.e. the thermal noise level, using only the shorter baselines is 
extremely difficult (see also [Braunj[2012). 



Confusion Noise and FG removal Whereas it is not clear that longer baselines are absolutely 
critical, having longer baselines substantially reduces the computational effort of calibrating 
the instrument, because the sky-model on these baselines consists of mostly compact, rather 
than extended sources and confusion is substantially reduced (confusion noise on the shorter 
baselines is a few mJy, much larger than the thermal noise). Compact sources can more easily be 
removed using these longer baselines without impacting the science done on shorter baselines. 



Directionally dependent instrument calibration One other issue, mostly neglected in the liter- 
ature, is that the amount of information contained on larger baselines in general is larger than 
on the shorter baselines, which are heavily redundant. The independent information that we 
can maximally obtain per time-stamp is the number of independent resolution elements in uv-u 
space. In case of otherwise similar arrays, the one with on average having much shorter base- 
lines will have a higher level of redundancy and therefore fewer independent data-point that can 
be used for calibration purposes, while at the same time reaching a higher signal-to-noise ratio 
per mode (by having more visibilities per resolution element). Calibrating on shorter baselines 
also requires a far more extended and complex sky-model (i.e. including the MW foreground, 
rather that mostly bright compact sources), which quite easily leads to larger calibration errors. 
Finally, bias is increased if the non-linearity in the models is large and/or the S/N ratio is small 
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(e.g. Cook et al. 1986). The latter is still under-appreciated in the current literature on calibra- 
tion, but it is a well-known effect in ML-statistics in cases of low S/N ratio and strong model 
covariance. 



Ionospheric corrections In addition to instrument calibration, longer baselines also allow for 
an easier modeling of the three-dimensional ionosphere above the array. Just as for instrument 
calibration, it is far easier to model the ionosphere using long baselines where its effects are 
more clearly visible (either because compact sources move and distort through refractive ef- 
fects, or break up through diffractive effects). Applying long-baseline ionospheric solutions to 
the shorter baselines through proper model projection, has - in the case of LOFAR - further 
simplified the modeling. Despite this, it remains difficult to correct for the ionosphere, but it is 
hard to imagine how stable ionospheric solutions can be obtained with only short baselines. 

All of this suggest that long baselines, although not proven to be absolutely essential, substan- 
tially help in correcting deep integrations for (i) compact FG source removal, (ii) calibration 
of the instrument on simple compact bright calibrators and (iii) visualize and correct for iono- 
spheric and directionally-dependent beam effects. Geometric arguments show that baselines 
needed should be as large as the imprint of the FoV at the height of the ionosphere. This is 
roughly 25 km per 5° FWHM FoV for an ionosphere at 300 km height. 



5.2.4 Power spectra versus Tomography 

There could potentially be a conflict between the two strategies for the baseline distribution 
discussed in the previous sections: sensitivity for power spectra naively drives one toward more 
compact arrays (e.g. MWA and PAPER), whereas tomography of EoR bubbles and structure 
requires baselines that can image on scales of a few arcminutes. As we saw, tomography benefits 
from an increase in collecting area, whereas, for a fixed number of stations, such an increase 
has little impact on the power spectra. The small fc-modes will be sample- variance dominated, 
so improvement can only come from an increase in the FoV. However, increasing the collecting 
area by making the stations larger will actually reduce the FoV. 



In Figures 24 - 27 we summarize the sensitivity of different array configurations and compare 
these to the requirements from both tomography and power- spectrum determination. We con- 
clude that for tomography at least a collecting area of 0.5-1 km 2 is required for a nominal 
lOOOhr integration time, whereas the power- spectrum measurements require a number of sta- 
tions of at least a few hundred (i.e. station sizes smaller than ^50 meter), otherwise SKA could 
perform even worse in power spectrum measurements than its current precursors that have much 
smaller collecting areas but also smaller stations (i.e. large fields of view). In all instances, SKA 
will be substantially better at tomography because of its superior sensitivity per spatial resolu- 
tion element. 

The "sweet-spot" is therefore, as expected based on the the original conception of the SKA, 
that at least a collecting area approaching 1 square kilometer is required, but also that sta- 
tion sizes should be relatively small (i.e. ^50m) compared to the 180 m that is currently often 
mentioned in SKA documentation. A too-small small field of view of SKA is detrimental for 



power- spectrum measurements (see also Sect 5.3 for an more extensive discussion on the issues 
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Figure 24: Shown are the power spectrum and tomographic sensitivity for k = 0.1 hMpc -1 as described 
in Eqn. 1 1 for an array of diameter of 2 km with a constant visibility density, for a range of collecting 
areas, station diameters and redshifts. An integration time of 1000 hrs is assumed and a bandwidth of 
10 MHz for the power spectrum determination. For tomography (vertical lines of constant collecting 
area) a bandwidth matching the /c-scales at that redshift is assumed. The dashed line indicates the de- 
markation below which the instantaneous uv -coverage has a filling factor of order unity. The gray box 
demarcates the region for which the field-of-view of the array exceed the required ~5 degrees at 100 
MHz. 



of field-of-view). How small stations really can become depends on correlator costs and cali- 
bratability which might be an issue for the current smallest SKA pathfinders such as MWA and 



PAPER, but also for LOFAR (see |Braun||2012| >. 



5.3 Instantaneous Field-of-View and Multi-beaming 

The minimum field of view (i.e. that of the station beam) of SKA should be set by the largest 
scale of the HI brightness temperature fluctuations that is of interest (or conceivably possible 
to measure) over the redshift/frequency range indicated in the previous sections (e.g. Sect 4.1[ ). 



The reason is that most of the information on scales exceeding this beam size is lost and can only 
partly (if at all) be recovered by multi-beaming or mosaicking in the image or wt>-space. This 
requires deconvolution of the uv -data, however, which is computationally expensive and leads 
to large uncertainties and errors if the beam-shape is not well known. If the largest relevant 
scales are not contained in the station beam, they will thus mostly be inaccessible, both for 



tomography and power spectrum measurements. As we showed Section 4.1 it is important to 
reach scales of order a few degrees. 
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Figure 25: Idem as Fig 24 for an array of 5 -km diameter. 



Building up an equivalent area of the sky through multi-beaming does not provide the same 
image or information as when observing it with a single equivalently-large beam. Scales larger 
than the individual station beams are effectively lost or highly uncertain when recovered. 
The largest possible scale of interest should therefore fit well inside the beam size, such that 
beam uncertainties do not play a major role. This scale is around a few degrees and corresponds 
roughly to k ~ 0.01 Mpc -1 . This scale is of interest at redshift of ^ 12, which corresponds 
roughly to 100 MHz. Hence, if the optimal frequency of the array is chosen at this frequency, 
a 5 degree station beam size at this frequency should be sufficient to cover all scales of interest 
for redshifts larger than z ~ 12, where the beam size increases. This implies a station size 
of roughly ^35 meters. Within the beam also a sufficient number of /c-modes (~100) can be 
measured to reduce the sample variance below ~ 3% per beam (Sect 4.1 ). Multi-beaming could 
reduce this sample variance further. 

Below this redshift, the amplitude of the power spectrum on these large angular scales rapidly 
decreases and the dominant A;-modes are on arcminute to tens of arcminutes scales. These scales 
also easily fit within the beam even at the highest frequencies (lower redshifts) of around 200 
MHz where the beam size would reduce to ^2.5 degrees for station sizes ~ 35m. A beam 
size of ~10 degrees at ~50 MHz, ~5 degrees at ~100 MHz and ~2.5 degrees at ~200 MHz 
therefore seems sufficient to image all possible scales of interest over the full frequency /redshift 
range (say 50-200 MHz). 

The required station size of around 35 meters or less, is substantially smaller than the 1 80 meter 
station currently proposed for SKA in phase 1. The latter stations would preclude the detection 
of the largest scale modes at all redshifts, effectively excluding high-z cosmology and EoR 
studies. Many extremely interesting physics phenomena (e.g. bulk-flows, etc) are occurring at 



69 




Figure 26: Idem as Fig 24 for an array of 2 -km diameter and k = 1 hMpc 



-i 



the degree scale p] 

At the same time, there is no need to go to stations much smaller than ~ 35 meters. For mapping 
scales larger than the station beam, multi-beaming can be used. These images will not contain 
structures larger than the beam-size, for example from the Milky Way (the greatest contaminant 
in EoR studies), but they will provide a complete census of the EoR and Cosmic Dawn without 
substantial loss of information. We note that mosaicking either in image or ww-space is con- 
siderably cheaper computationally than cross-correlating all elements and producing images on 
scales far exceeding 5 degrees, since no CD/EoR-relevant science is expected on these scales. 

5.3.1 Global Signal Requirements 

The problem of measuring the global 21cm signal is one of bandpass calibration of instrument 
gain and receiver temperature, and not of collecting area or thermal noise. In theory, a single 
well calibrated dipole can build up the required signal to noise ratio within a day. While sin- 



gle antenna experiments have put interesting constraints on the global 21cm signal (Bowman 



& Rogers] [2010), in practice they suffer from systematic errors introduced by the frequency 



dependent instrument response. Having an antenna element as part of an interferometric array 
such as SKA provides significantly better constraints on bandpass gain calibration and estima- 
tion of frequency dependent antenna element beam. Moreover an array environment facilitates 

10 We note that somewhat larger beams might be ok (maybe up to 70 meters) with multi-beaming and wt>-plane 
dithering but this will require careful thinking about how to connect these multi-beamed data into a single power 
spectrum in overlapping areas. It can best be done in the uw-plane by combining the visibilities brought to a 
common phase-center. Finally a hybrid system where only a sub-set of receiver elements inside stations are beam- 
formed and correlated could be considered. 
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Figure 27: Idem as Fig 24 for an array of 5 -km diameter and k = 1 hMpc 
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better algorithms for RFI detection, provides information on ionospheric and multipath condi- 
tions for assessment of data quality and facilitates foreground subtraction using interferometric 
sky images. 

Cross -correlations (or visibilities) measured by an interferometer are not sensitive to a global 
signal, as the global signal has significant power only at the origin of the uv -plane (zero base- 
line). On the other hand, the autocorrelations of elements in an array provide a zero baseline 
measurement, contain the global signal power and receiver noise (the receiver noise drops off 
in non-zero baseline visibilities) which has to be modeled in addition to the instrumental gain. 
This has traditionally been achieved by switching between the sky and a calibrated noise source. 
Therefore, using antenna elements that are both part of an array and have calibrated noise in- 
jection have the potential to significantly reduce systematic errors. 

Another possible approach to detect the global 21cm signal involves blocking a significant 
portion of the instrument field of view with an obstacle such as the moon (or even a man-made 
structure). This creates a hole in the otherwise uniform global signal and couples power into the 
visibilities (cross correlations) even on non-zero baselines. This technique enables measurement 
of a global signal without using the auto correlations and hence does not need switching between 
the sky and a calibrated noise source. However, it is still in its infancy (Briggs et al. in prep.). 
There are several current efforts attempting a global signal measurement. EDGES (single 
dipole) targets the decline of the HI signal through reionization. New concept experiments 
are attempting a detection of the expected absorption trough at ~70 MHz where the first lu- 
minous structures were formed: array based observations with LOFAR-LBA stations (Harish 
& Koopmans in prep.), observations with the Long Wavelength Array (LWA) in beam forming 
mode (Bowman priv. comm.), observations with the Large- aperture Experiment to detect the 
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Dark Ages (LEDA, |Greenhill & Bernardi|2012[ ) and Dark Ages Radio Explorer (DARE, [Burns 
et al.|2012| Harker et al.|2012 ) - a proposed single dipole space mission. 



Due to its less demanding requirement on collecting area and number of stations, a global 21cm 
measurement has the potential to generate early science for the SKA during its commission- 
ing phase. Successful measurement of the global signal also has a potential to establish the 
reionization redshift range accurately for future SKA tomography experiments. A global sig- 
nal experiment will also provide useful benchmarks on the calibratability of the instrument's 
frequency response. To facilitate a global signal measurement with the SKA, it is desirable to 
have calibrated noise injection on at least a few antenna elements. These antenna elements may 
be dipoles that are physically apart from the SKA stations. Furthermore, to measure the full 
evolution of the global signal, a frequency coverage of ~ 40 to 200 MHz is desirable. 



5.3.2 Beam-size and calibration 

Apart from the science requirements, there are also calibration issues that need to be considered 



in the choice of beam size as recently worked out in great detail by Braun (2012). A larger 
beam size requires more directionally dependent solutions (for e.g. the ionosphere, beam shape, 
etc). Multiple smaller beams in that case have the advantage that they can be flexibly distributed 
over the beam of the smallest (non beam-formed) receiver element of the station, allowing for 
example bright calibrators to be placed in or near each station beam. This might not be possible 
for a single large beam if there is an insufficient number bright calibrators in a single contiguous 
region. A second consideration is related to the ionosphere, which for a wide beam requires a 
full three-dimensional treatment (e.g. Koopmans 2010). Although this still holds for beams of 
~5 degrees at 100 MHz, modeling the ionosphere for multiple smaller fields will be easier than 
for a single contiguous wide field, especially at higher resolutions. 

We thus conclude that the upper limit on the station size is set by science requirements, whereas 
the lower limit is mostly set by costs (e.g. correlator, electronics, etc) and the ability to calibrate 
the instrument. A sweet-spot seems to be around ~35m, although somewhat smaller and larger 
sizes could probably be accommodated. We again stress that 180m stations are really detrimen- 
tal for CD/EoR science because of the tiny instantaneous field of view of such an array and the 
inability to recover large-scale information on the sky through multi-beaming. 



5.3.3 Connection between station size, total collecting area and ww-filling 

Finally, the station size (hence beam size) is intimately connected to the instantaneous uv filling. 
For a fixed physical station size -D stat inside a core diameter of D covc , it is easy to show that for 
a fully non-redundant array the requirement for the minimum number of station to acquire full 
instantaneous uv -coverage is 

> D 

core stat- 

For say D core = 2 km and D s t a t — 35 m the minimum number of stations would be at least 
^60 and fewer for larger stations. Very large stations only lead to good instantaneous uv- 
coverage for a smaller core area, if the total collecting area is fixed, which goes at the cost of 
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losing resolution in tomography (again emphasizing the sometimes conflicting power spectra 
and tomography requirements). Note that such an array will have a collecting area of 

^coii = D coie /D stat x 7r(D stat /4) 2 = (tt/4)D cotc x D stat . 

For the numbers given above this would be 60,000 m 2 , similar to LOFAR, which is insufficient 
for most of the SKA CD/EoR science requirements. Hence there is friction between instanta- 
neous uv -coverage, core collecting area and station size. To have the required station beam size 
of at least 5 degrees at 100 MHz one simply needs stations that are not too large. This requires 
many more stations than for 180-m size stations for example, but would satisfy the instanta- 
neous uv -coverage criterium. Even if the latter is not critical, to satisfy the A/T requirements 
of SKA more smaller stations are strongly preferred, even required, over a smaller number of 
larger stations. This is particularly true if they are distributed over a core area with a diameter 
of 2-5 km although this can be alleviated if the core area is made smaller. Since tomography on 
scales corresponding to 5 km baselines will be very hard, a smaller core area (~ 2 km) might 
be the best option possibly with most of the stations within a few km core area. We note that 
this statement is independent of the use of longer baselines for calibration purposes. 
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6 Recommendations 



In this section, we summarize succinctly our recommendations for an optimal SKA array con- 
figuration for CD/EoR science requirements. 

6.1 Choosing an optimal and affordable SKA CD/EoR array 

We conclude that the best strategy for designing an SKA array optimized for EoR/CD science is: 
(1) Set the largest scale one would like to probe and match the station size to that requirement; 
additional sky area is then build up via multi-beaming. (2) Scale the core area to an area that 
can probe all power- spectrum /c-modes of interest and still enable tomography (i.e. imaging) 
on the smallest angular scales probed by k± which require somewhat larger baselines. (3) Set 
the number of stations or collecting area (since the station size has now been fixed) to reach 
the required tomography and power spectrum sensitivity level and use multi-beaming to reduce 
sample variance. We note that in phase- 1 one might start with a more compact array with some 
longer baselines, and during phase-2 extend the core further, as well as add more long baselines. 
A very compact array however would not allow tomography on small (arminute) spatial scales. 

6.2 Towards an optimal reference design 

A reference design that satisfies all criteria for power spectrum determination and tomography 
from arc minute to degrees scales would be the following for SKA-1 and 2£j 

1. An absolute minimal frequency range 54-190 MHz; an optimal frequency range 54- 
215 MHz and a wide frequency range of 40-240 MHz; The latter frequency range fully 
covers the Cosmic Dawn and EoR eras for all currently conceivable scenarios, whereas 
the first one is the most narrow 3.5: 1 range that a single dipole receiver element can cover 
with more than 10% efficiency over the entire bandwidth/* 2 "] A full frequency coverage 
(40-240 MHz) would argue for a dual-band receiver system at the lowest frequencies. 

2. A frequency resolution of ~100 KHz suffices for power-spectra and tomography studies 
since it is well below the spatial scales that can conceivably be measured by SKA. A 
much higher resolution of ~1 KHz, however, is required for neutral HI absorption line 
studies to resolve very narrow (i.e. low-velocity dispersion) lines and for RFI excision. 

3. A physical collecting area v4 co n ^ 1km 2 x (z/ crit /100MHz)~ 2 for z/ crit < 100 MHz and 
at least 1 km 2 for z/ crit > 100 MHz. This collecting area ensures sensitivities of ^ 1 mK 
on scales of ^ 5 arc minutes (with matching bandwidths) over the entire redshift range of 
the SKA-low, sufficient to accomplish most science goals in 1000 hrs of observing time. 

11 We recognize that more details need to be worked out especially on costs and calibratability of the array, but 
as far as we are aware this array design does not have major show stoppers and could be used as a starting point 
for a more detailed design. 

12 These ranges have been argued for as well in the Memo "Is There an Optimum Frequency Range for SKAl-lo? 
Question 1 of the Magnicent Memoranda II" by Huynh et al. 
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4. An optimal frequency (v op t', corresponding to a A/2 size of a receiver dipole) around 
100 MHz, but possibly lower with increasing physical collecting area to compensate for 
loss in effective collecting area (see item above) if the size of the receiver element is not a 
major cost driver and if station-beam side lobes can be suppressed sufficiently by placing 
them in a semi-random pattern. 

5. A core area with a diameter of ^ 5 km with most collecting area (~75%) inside the inner 
2 km for power- spectrum and large-scale tomography studies, plus baselines out to 5 km 
for arcminute- scale (i.e. EoR bubble) tomography. This is in line with current ideas on 
the layout of the SKA core area. The total core collecting area should be at least that 
given in item -3-. 

6. A set of longer baselines (~ 10-20% of the core collecting area) out to ~ 100 km for cal- 
ibration, ionospheric modeling^] and for building a detailed sky model. SKA-pathfinders 
(in particular LOFAR) show that many image artifacts are due to errors in the subtraction 
of bright compact sources and the ionosphere and both benefit tremendously from the use 
of longer baselines to model the sky and the ionosphere. 

7. A station size of order ~35 m which corresponds to a 2.5-10 degree field-of-view from 
200 MHz down to 50 MHz, which covers all (known) scales of interest. We propose that 
multi-beaming can be used to cover larger areas of the sky simultaneously and reduce 
sample variance in power- spectrum studies. A very small station size is likely more costly 
and harder to calibrate and a much larger station would be detrimental for the science case. 



The proposed basic SKA-low array design allows most CD/EoR science goals, described in this 
white paper, to be reached in 1000 hrs of observing time, but undoubtedly will raise many new 
and groundbreaking scientific questions as well. We note that many of the requirements are 
already part of the DRM of SKA. However, we also propose a number of clear differences: We 
argue for 

(i) a wider (and lower) frequency range, going well below the 70 MHz envisioned for SKA- 1 . 
This could potentially argue for the use of a dual-band receiver system. 

(ii) a smaller station size of ~35 m, compared to the ~ 1 80 m currently envisioned for SKA- 1 , 
to ensure that all scales of interest easily fit within a single station beam, 

(iii) the use of longer baselines out to tens of kilometers for instrument/ionosphere calibration 
and sky-modeling purposes. 



These baselines exceed the imprint of the station beam on the ionosphere and larger beam-sizes therefore 
require longer baselines, somewhat counterintuitively. 
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7 Summary 



This White Paper summarizes some of the exciting scientific prospects of studying the Cosmic 
Dawn (CD) and Epoch of Reionization (EoR) through the redshifted 21 -cm emission line, in 
particularly focussing on prospects for the Square Kilometre Array (SKA) and how science 
goals translate in to a basic reference design that allow these science goals to be reached. We 
focus not only power- spectrum measurements, which are driving all current EoR pathfinder 
telescopes (e.g. GMRT, PAPER, MWA, LOFAR), but also, and more importantly, on imag- 
ing/tomography of the Cosmic Dawn and Epoch of Reionization. Whereas the basic reference 
design that we propose is relatively close to that proposed in many SKA documents and memos, 
it deviates in a few instances. The most of important are (a) a frequency coverage extending 
to lower frequencies / higher redshifts than previously envisaged, driven mostly by improved 
knowledge since 2004 (when the SKA Science Case was written) of when reionization occurred 
and what physics plays a role and (b) a station size being substantially smaller than the cur- 
rently proposed sizes for SKA 1 and/or 2, but substantially larger than the single receiver ele- 
ments that also have been suggested (e.g. HERA). The latter choice is driven mostly by the need 
for imaging/tomography of the largest relevant scales while keep the correlator costs as low as 
possible. Beyond these largest relevant scales the the loss in field-of-view can be compensated 
through multi-beaming. Whereas we regard this manuscript as a living document which, using 
input from the community, will be updated as our understanding progresses, we view this first 
version as a starting point for a realistic SKA-low array for CD/EoR studies. 

The SKA is expected to revolutionize the studies of the earliest phases of star and galaxy for- 
mation in the Universe. Whereas precursors can only probe the power spectrum of the 21cm 
signal from the Epoch of Reionization, SKA will allow tomography of most, if not the entire, 
period, on scales which will allow us to follow the growth of ionized regions from initially 
small to large. At the same time, SKA will also for the first time explore the earlier phases 
of the Cosmic Dawn, before substantial reionization, probably mostly in a statistical way, but 
possibly also with low resolution tomography. These studies will teach us fundamentally new 
things about both the earliest phases of star and galaxy formation, as well as cosmology and 
even have the potential to lead to the discovery of new physical phenomena. 
The 21cm signal from neutral hydrogen can be analyzed in different ways. Imaging at different 
frequencies will give us a tomographic volume with both spatial and evolutionary information. 
These data sets can be analyzed to characterize the sizes and shapes of ionized regions, as well as 
the density structures in still neutral regions. This information can also be combined with other 
probes of the EoR/CD, such as galaxy and QSO surveys, and the different types of background 
radiation (NIRB, atomic & molecular lines, CMB), allowing us to make the connection between 
the properties of the galaxies and the effect they have on the IGM. 

Analysis of the power spectrum of the 21cm signal will characterize the relevant length scales, 
as well as provide fundamental cosmological information through the analysis of the redshift 
space distortions. Since the distribution function of the 21cm signal is non-Gaussian, further 
analysis using higher order statistics will also be used. 

The discovery of bright radio sources from the EoR will enable us to study the 21cm forest, 
giving information about small scale structures in the IGM. Using the auto-correlations of the 
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SKA elements will allow us to extract the evolution of the global 21cm signal, tracing the global 
rise of star formation and the emergence of X-ray sources during the Cosmic Dawn through to 
the gradual disappearance of the neutral hydrogen during the EoR. 

In order to trace the relevant size scales, fields of view of 1° - 2° already at the highest fre- 
quencies are required. These fields need to be observed within one observation and cannot be 
reconstructed from different observations of smaller fields. 

The redshifted 21cm signal has to be retrieved from data containing strong foregrounds, mostly 
galactic, but also extragalactic. In order to minimize the effects of foregrounds, the observed 
fields should be located in areas of low foreground emission and without strong polarization 
features. The techniques for foreground subtraction will be put to the test on the data from SKA 
precursors in the coming few years. 

Additional complications to deal with are radio frequency interference (RFI) and effects caused 
by the ionosphere. Experience with LOFAR shows that even in a radio-loud environment, RFI 
can be dealt with, provided the telescope has sufficient frequency and time resolution, as well as 
a sufficient number of ADC bits. Correcting for ionospheric effects will require reconstructing 
some of the three-dimensional structure of the ionosphere. For this data from long baselines is 
important. 

Since the redshifted 21cm signal is weak and the foregrounds strong, the accuracy of calibration 
has to be high in order to extract the 21cm signal from the total signal. Apart from dealing 
with effects caused outside the array, this also puts constraints on the quality of the hard- and 
software components of SKA. Any effects caused by these components should be small and 
stable enough to be calibrated out. 

The active precursors to SKA-low, namely GMRT, LOFAR, PAPER and MWA, have not yet 
succeeded in detecting any redshifted 21cm signal. Still, the experiences gained in observing at 
these low frequencies are important for the development of SKA-low. The coming years will 
see an increase in the activities of many of these precursors. The expectation is that one or more 
of the precursors will at least statistically detect a signal from the EoR. It is essential that the 
experience from these activities will keep finding its way into the SKA project. 
To enable the exciting prospects of in depth studies of the Cosmic Dawn and EoR, the low fre- 
quency part of SKA needs to be carefully designed to maximize the scientific return. Ideally 
we should be able to trace to entire CD/EoR period, implying a frequency coverage of 40 - 
240 MHz, with an optimal frequency of around 108 MHz. Sacrificing the lower frequency part 
of this interval will remove the capability to trace the effects of the earliest stars, and lowering 
the maximum frequency will prevent us from mapping out the last larger neutral patches re- 
maining in the Universe. If a ratio of 6:1 is unattainable, a still acceptable range would be 54 - 
215 Mhz (4:1) or 54- 190 Mhz (3.5:1). 

Measuring a power spectrum and performing tomography impose different types of constraints 
on the design of an interferometer. We propose that the best compromise between the two is to 
first determine the total collecting area needed for tomography and to then choose the station 
size needed to capture the relevant angular scales. Imaging requirements of ~ 1 mK sensitivity 
on scales of a few arcminutes imply a collecting area of ~ 1 km 2 as has been clear from the 
inception of SKA. 

For the CD/EoR, the largest angular scale to which we have to be sensitive is around 5°. An 
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optimal choice for the station size is therefore around 35 m. It is important to stress that multi- 
beaming/mosaicking cannot be used to reconstruct information about scales larger than a station 
beam since this information is not contained in the observations. 

Since tomography at angular scales smaller than a few arcminutes is not possible anyway, the 
large majority of the stations should be distributed over an area with a diameter of about 2- 
5 km. In addition to this there should be 10-20% much larger baselines which are needed for 
calibration. Experience with LOFAR has shown that resolving bright objects to angular scales 
lOx below the scale of arcminutes one is interested in, is essential. These longer baselines 
are also expected to substantially improve the capabilities to correct for ionospheric distortions. 
Therefore stations out to ~ 100 km from the centre of the core area are required. 
To enable 21cm absorption studies against bright sources, a frequency resolution of 1 kHz is 
required. This resolution is also beneficial for RFI excision. Although the site of SKA-low is 
characterized by a very low level of RFI, both dealing with the remaining RFI and with the 
ionospheric effects call for a high frequency resolution, as well as time resolution. 
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